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PVEREXPRRPPION OF MAMMALIAN Ann vt ral PRf yrffT Wf ; 
Field of th ? Invent inn 
The invention concerns genes and methods for 
5 expressing eukaryotic and viral proteins at high levels 
in eukaryotic cells. 

Background of the iny^lm 
Expression of eukaryotic gene products in 
prokaryotes is sometimes limited by the presence of 
10 codons that are infrequently used in E. coli. Expression 
of such genes can be enhanced by systematic substitution 
of the endogenous codons with codons overrepresented in 
highly expressed prokaryotic genes (Robinson et al. 
1984). it is commonly supposed that rare codons cause 
15 pausing of the ribosome, which leads to a failure to 
complete the nascent polypeptide chain and a uncoupling 
of transcription and translation. The mRNA 3' end of the 
stalled ribosome is exposed to cellular ribonucleases, 
which decreases the stability of the transcript. 
20 Summary off the Tnvpnti ?n 

The invention features a synthetic gene encoding a 
protein normally expressed in mammalian cells wherein at 
least one non-preferred or less preferred codon in the 
natural gene encoding the mammalian protein has been 
2 5 replaced by a preferred codon encoding the same amino 
acid. 

Preferred codons are: Ala (gcc) ; Arg (cgc) ; Asn 
(aac) ; Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (ggc) ; His 
(cac); He (ate); Leu (ctg) ; Lys (aag) ; Pro (ccc) ; Phe 
30 (ttc); Ser (age); Thr (acc) ; Tyr (tac) ; and Val (gtg) . 
Less preferred codons are: Gly (ggg) ; lie (att) ; Leu 
(etc) ; Ser (tec) ; Val (gtc) . All codons which do not fit 
the description of preferred codons or less preferred 
codons are non-preferred codons. 
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By protein normally expressed in mammalian cells 
is meant a protein which is expressed in mammalian under 
natural conditions. The term includes genes in the 
mammalian genome such as Factor VIII, Factor IX, 
5 inter leukins, and other proteins. The term also includes 
genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes 
which are encoded by a virus (including a retrovirus) 
which are expressed in mammalian cells post-inf ection 

10 in preferred embodiments, the synthetic gene is 

capable of expressing said mammalian protein at a level 
which is at least 110%, 150%, 200%, 500%, 1,000%, or 
10 , 000% of that expressed by said natural gene in an in 
vitro mammalian cell culture system under identical 

15 conditions (i.e., same cell type, same culture 
conditions, same expression vector). 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding 
natural gene are described below. Other suitable 

20 expression systems employing mammalian cells are well 
known to those skilled in the art and are described in, 
for example, the standard molecular biology reference 
works noted below. Vectors suitable for expressing the 
synthetic and natural genes are described below and in 

25 the standard reference works described below. By 

"expression" is meant protein expression. Expression can 
be measured using an antibody specific for the protein of 
interest. Such antibodies and measurement techniques are 
well known to those skilled in the art. By "natural 

30 gene" is meant the gene sequence which naturally encodes 
the protein. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in the 
natural gene are non-preferred codons. 
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In a preferred embodiment the protein is a 
retroviral protein. m a more preferred embodiment the 
protein is a lentiviral protein. In an even more 
preferred embodiment the protein is an HIV protein. m 
5 other preferred embodiments the protein is gag, pol, env, 
gpl20, or gpieo. In other preferred embodiments the 
protein is a human protein. 

The invention also features a method for preparing 
a synthetic gene encoding a protein normally expressed by 
10 mammalian cells. The method includes identifying non- 
preferred and less-preferred codons in the natural gene 
encoding the protein and replacing one or more of the 
non-preferred and less-preferred codons with a preferred 
codon encoding the same amino acid as the replaced codon. 
15 Under some circumstances (e.g., to permit 

introduction of a restriction site) it may be desirable 
to replace a non-preferred codon with a less preferred 
codon rather than a preferred codon. 

It is not necessary to replace all less preferred 
20 or non-preferred codons with preferred codons. Increased 
expression can be accomplished even with partial 
replacement. 

In other preferred embodiments the invention 
features vectors (including expression vectors) 
25 comprising the synthetic gene. 

By "vector" is meant a DNA molecule, derived, 
e.g., from a plasmid, bacteriophage, or mammalian or 
insect virus, into which fragments of DNA may be inserted 
or cloned. A vector will contain one or more unique 
30 restriction sites and may be capable of autonomous 

replication in a defined host or vehicle organism such 
that the cloned sequence is reproducible. Thus, by 
"expression vector" is meant any autonomous element 
capable of directing the synthesis of a protein. Such 
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DMA expression vectors include mammalian plasmids and 
viruses. 

The invention also features synthetic gene 
fragments which encode a desired portion of the protein. 
5 Such synthetic gene fragments are similar to the 

synthetic genes of the invention except that they encode 
only a portion of the protein. Such gene fragments 
preferably encode at least 50, 100, 150, or 500 
contiguous amino acids of the protein. 
10 In constructing the synthetic genes of the 

invention it may be desirable to avoid CpG sequences as 
these sequences may cause gene silencing. 

The codon bias present in the HIV gpl20 envelope 
gene is also present in the gag and pol proteins. Thus, 
15 replacement of a portion of the non-preferred and less 
preferred codons found in these genes with preferred 
codons should produce a gene capable of higher level 
expression. A large fraction of the codons in the human 
genes encoding Factor VIII and Factor IX are non- 
20 preferred codons or less preferred codons. Replacement 
of a portion of these codons with preferred codons should 
yield genes capable of higher level expression in 
mammalian cell culture. Conversely, it may be desirable 
to replace preferred codons in a naturally occurring gene 
2 5 with less-preferred codons as a means of lowering 
expression. 

Standard reference works describing the general 
principles of recombinant DNA technology include Watson, 
J.D. et al., Molecular Biolociv o f the Gene, Volumes I and 

30 II, the Benjamin/Cummings Publishing Company, Inc., 

publisher, Menlo Park, CA (1987); Darnell, J.E. et al., 
Molecular cell Biology . Scientific American Books, Inc., 
Publisher, New York, N.Y. (1986); Old, R.W., et al. , 
Principles nf Gene Manipulation; An Introduction t<? 

35 Genetic Engineering . 2d edition, University of California 
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Press, publisher, Berkeley, CA (1981); Maniatis, t. et 

nboratory f^ny^ 2nd Ed. 



Cold Spring Harbor Laboratory, publisher, Cold Spring 
Harbor, NY (1989); and Current Prn ^i. 1n n - T ~ in| „ 
5 fiialsm, Ausubel et al., Wiley Press, New York, NY 
(1989) . 

Detail** n Tmrr | gr j ffn 
Description of th«, p rfl » 1rm 
Figure 1 depicts the sequence of the synthetic 
10 gpl20 (SEQ ID NO: 34) and a synthetic gpieo (SEQ ID NO: 
35) gene in which codons have been replaced by those 
found in highly expressed human genes. 

Figure 2 is a schematic drawing of the synthetic 
gpl20 (HIV-i mn) gene. The shaded portions marked vl to 
15 v5 indicate hypervariable regions. The filled box " 

indicates the CD4 binding site. A limited number of the 
unique restriction sites ares shown: H (Hind3) , Nh 
(Nhel), P (Pstl), Na (Nael), M (Mlul) , R (EcoRl) , a 
(Agel, and No (Notl) . The chemically synthesized DNA 
fragments which served as PCR templates are shown below 
the gpi20 sequence, along with the locations of the 
primers used for their amplification. 

Figure 3 is a photograph of the results of 
transient transfection assays used to measure gpi20 
2 5 expression. Gel electrophoresis of immunoprecipitated 
supernatants of 293T cells transfected with plasmids 
expressing gp 12 0 encoded by the IIIB isolate of HIV-i 
(g P 120IIIb), by the MN isolate (gpl2 0mn) , by the MN 
isolate modified by substitution of the endogenous leader 
30 peptide with that of the CD 5 antigen (g P 120mnCD5L) , or by 
the chemically synthesized gene encoding the MN variant 
with the human CDSLeader (syng P l20mn) . Supernatants were 
harvested following a 12 hour labeling period 60 hours 
post-transfection and immunoprecipitated with CD4:lgGi 
35 fusion protein and protein A sepharose. 



20 
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Figure 4 is a graph depicting the results of ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 
cells transfected with plasmids expressing g P 120 encoded 

5 by the IIIB isolate of HIV-1 (g P 120 Illb) . by the MN 
isolate (g P 120mn) . by the MN isolate modified by 
substitution of the endogenous leader peptide with that 
of CD 5 antigen (g P 120mn CD5L) , or by the chemically 
synthesized gene encoding the MN variant with human CDS 

10 leader (syngpl20mn) were harvested after 4 days and 
tested in a gpl20/CD4 ELISA. The level of g P 120 is 

expressed in ng/ml. 

Figure 5, panel A is a photograph of a gel 
illustrating the results of a immunoprecipitation assay 
15 used to measure expression of the native and synthetic 
g P 120 in the presence of rev in trans and the RRE in cis. 
in this experiment 293T cells were transiently 
transfected by calcium phosphate coprecipitation of 10 M9 
of plasmid expressing: (A) the synthetic g P 120MN sequence 
20 and RRE in cis, (B) the gpl20 portion of HIV-1 IIIB, (C) 
the g P 120 portion of HIV-1 IIIB and RRE in cis, all m 
the presence or absence of rev expression. The RRE 
constructs gpl20IIIbRRE and syng P 120mnRRE were generated 
using an Eagl/Hpal RRE fragment cloned by PCR from a 
25 HIV-l HXB2 proviral clone. Each gpl20 expression plasmxd 
was cotransfected with 10 ^ of either pCMVrev or CDM7 
plasmid DMA. Supernatants were harvested 60 hours post 
transfection, immunoprecipitated with CD4:IgG fusion 
protein and protein A agarose, and run on a 7% reducing 
30 SDS-PAGE. The gel exposure time was extended to allow the 
induction of g P 120IIIbrre by rev to be demonstrated. 
Figure 5, panel B is a shorter exposure of a similar 
experiment in which syng P 120mnrre was cotransfected with 
or without pCMVrev. Figure 5, panel C is a schematic 
35 diagram of the constructs used in panel A. 
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Figure 6 is a comparison of the sequence of the 
wildtype rat THY-1 gene (wt) (SEQ. ID. NO: 37) and a 
synthetic rat THY-1 gene (env) (SEQ. ID. NO: 36) 
constructed by chemical synthesis and having the most 
5 prevalent codons found in the HIV-i env gene. 

Figure 7 is a schematic diagram of the synthetic 
ratTHY-1 gene. The solid black box denotes the signal 
peptide. The shaded box denotes the sequences in the 
precursor which direct the attachment of a phophatidyl- 
10 inositol glycan anchor. Unique restriction sites used 
for assembly of the THY-1 constructs are marked H 
(Hind3), M (Mlul), S (Saci) and No (Notl) . The position 
of the synthetic oligonucleotides employed in the 
construction are shown at the bottom of the figure. 
15 Figure 8 is a graph depicting the results of flow 

cytometry analysis. In this experiment 293T cells 
transiently transfected with either wildtype rat THY-i 
(dark line), ratTHY-1 with envelope codons (light line) 
or vector only (dotted line). 293T cells were 
20 transfected with the different expression plasmids by 
calcium phosphate coprecipitation and stained with anti- 
ratTHY-1 monoclonal antibody 0X7 followed by a polyclonal 
FITC- conjugated anti-mouse IgG antibody 3 days after 
transf ection . 

25 Figure 9, panel A is a photograph of a gel 

illustrating the results of immunoprecipitation analysis 
of supernatants of human 293T cells transfected with 
either syngpi20mn (A) or a construct syngpl20mn . rTHY-lenv 
which has the rTHY-lenv gene in the 3' untranslated 

30 region of the syngpl20mn gene (B) . The 

syngpl20mn. rTHY-lenv construct was generated by inserting 
a Notl adapter into the blunted Hind3 site of the 
rTHY-lenv plasmid. Subsequently, a 0.5 kb Notl fragment 
containing the rTHY-lenv gene was cloned into the 
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Notl site of the syngpl20ron plasmid and tested for 
correct orientation. Supernatants of 35S labelled cells 
were harvested 72 hours post transf ection , precipitated 
with CD4:IgG fusion protein and protein A agarose, and 
5 run on a 7% reducing SDS-PAGE. Figure 9, panel B is a 
schematic diagram of the constructs used in the 
experiment depicted in panel A of this figure. 

Description of the Preferred Embodiments 

^ n . tnr »j» P nf a s v »i-hetic qp1?0 Gene Having Cpdons 

10 Fr?nnri in H ighly Exprpsspd Human Genes 

A codon frequency table for the envelope precursor 
of the LAV subtype of HIV-1 was generated using software 
developed by the University of Wisconsin Genetics 
Computer Group. The results of that tabulation are 

15 contrasted in Table 1 with the pattern of codon usage by 
a collection of highly expressed human genes. For any 
amino acid encoded by degenerate codons, the most favored 
codon of the highly expressed genes is different from the 
most favored codon of the HIV envelope precursor. 

2 0 Moreover a simple rule describes the pattern of favored 
envelope codons wherever it applies: preferred codons 
maximize the number of 

adenine residues in the viral RNA. In all cases but one 
this means that the codon in which the third position is 

25 A is the most frequently used. In the special case of 
serine, three codons equally contribute one A residue to 
the mRNA; together these three comprise 85% of the codons 
actually used in envelope transcripts. A particularly 
striking example of the A bias is found in the codon 

30 choice for arginine, in which the AGA triplet comprises 
88% of all codons. In addition to the preponderance of A 
residues, a marked preference is seen for uridine among 
degenerate codons whose third residue must be a 
pyrimidine. Finally, the inconsistencies among the less 



WO 96/09378 



PCT/US95/I1511 



- 9 - 

frequently used variants can be accounted for by the 

observation that the dinucleotide CpG is 

underrepresented; thus the third position is less likely 

to be G whenever the second position is C, as in the 

5 codons for alanine, proline, serine and threonine; and 

the CGX triplets for arginine are hardly used at all. 

TABLE l: Codon Frequency in the HIV-1 Illb env gene 
and in highly expressed human genes. 
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in order to produce a g P 120 gene capable of high 
0 level expression in mammalian cells, a synthetic gene 
encoding the g P 120 segment of HIV-1 was constructed 
(S yng P 120mn), based on the sequence of the most common 
North American subtype, HIV-1 KN (Shaw et al. 1984; Gallo 
et al 1986). In this synthetic g P 120 gene nearly all of 
5 the native codons have been systematically replaced with 
codons most frequently used in highly e>cpressed human 
genes (FIG. 1). This synthetic gene was assembled from 
chemically synthesized oligonucleotides of 150 to 200 
bases in length. If oligonucleotides exceeding 120 to 
30 150 bases are chemically synthesized, the percentage of 
full-length product can be low, and the vast excess of 
material consists of shorter oligonucleotides. Since 
these shorter fragments inhibit cloning and PCR 
procedures, it can be very difficult to use 
35 oligonucleotides exceeding a certain length. In order to 
use crude synthesis material without prior purification, 
single-stranded oligonucleotide pools were PCR amplified 
before cloning. PCR products were purified in agarose 
gels and used as templates in the next PCR step. Tvo 
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adjacent fragments could be co-amplified because of 
overlapping sequences at the end of either fragment. 
These fragments, which were between 350 and 4 00 bp in 
size, were subcloned into a pCDM7-derived plasmid 
5 containing the leader sequence of the CDS surface 
molecule followed by a Nhel/Pstl/Mlul/EcoRl/BamHl 
poly linker. Each of the restriction enzymes in this 
polylinker represents a site that is present at either 
the 5' or 3' end of the PCR-generated fragments. Thus, 
10 by sequential subcloning of each of the 4 long fragments, 
the whole gpl20 gene was assembled. For each fragment 3 
to 6 different clones were subcloned and sequenced prior 
to assembly. A schematic drawing of the method used to 
construct the synthetic gp!20 is shown in FIG. 2. The 
15 sequence of the synthetic gpl20 gene (and a synthetic 
gpl60 gene created using the same approach) is presented 
in FIG. l. 

The mutation rate was considerable. The most 
commonly found mutations were short (l nucleotide) and 
20 long (up to 30 nucleotides) deletions. In some cases it 
was necessary to exchange parts with either synthetic 
adapters or pieces from other subclones without mutation 
in that particular region. Some deviations from strict 
adherence to optimized codon usage were made to 
25 accommodate the introduction of restriction sites into 
the resulting gene to facilitate the replacement of 
various segments (FIG. 2). These unique restriction sites 
were introduced into the gene at approximately 100 bp 
intervals. The native HIV leader sequence was exchanged 
30 with the highly efficient leader peptide of the human CDS 
antigen to facilitate secretion. The plasmid used for 
construction is a derivative of the mammalian expression 
vector pCDM7 transcribing the inserted gene under the 
control of a strong human CMV immediate early promoter. 
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To compare the wild-type and synthetic gpl20 
coding sequences, the synthetic g P 12 0 coding sequence was 
inserted into a mammalian expression vector and tested in 
transient transfection assays. Several different native 
5 gpl20 genes were used as controls to exclude variations 
in expression levels between different virus isolates and 
artifacts induced by distinct leader sequences. The 
gpl20 HIV Illb construct used as control was generated by 
PCR using a Sall/Xhol HIV-1 HXB2 envelope fragment as 

10 template. To exclude PCR induced mutations a Kpnl/Earl 
fragment containing approximately 1.2 kb of the gene was 
exchanged with the respective sequence from the proviral 
clone. The wildtype g P 120mn constructs used as controls 
were cloned by PCR from HIV-l MN infected C8166 cells 

15 (AIDS Repository, Rockville, MD) and expressed g P 120 
either with a native envelope or a CD5 leader sequence. 
Since proviral clones were not available in this case, 
two clones of each construct were tested to avoid PCR 
artifacts. To determine the amount of secreted gpl20 

20 semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate 
coprecipitation were immunoprecipitated with soluble 
CD4 : immunoglobulin fusion protein and protein A 
sepharose . 

25 The results of this analysis (FIG. 3) show that 

the synthetic gene product is expressed at a very high 
level compared to that of the native gpl20 controls. The 
molecular weight of the synthetic g P 120 gene was 
comparable to control proteins (FIG. 3) and appeared to 

30 be in the range of 100 to 110 kd. The slightly faster 
migration can be explained by the fact that in some tumor 
cell lines like 293T glycosylate is either not complete 
or altered to some extent. 

To compare expression more accurately gpl20 

35 protein levels were quantitated using a gp!20 ELISA with 
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CD4 in the demobilized phase. This analysis shows (FIG. 
4) that ELISA data were comparable to the 
immunoprecipitation data, with a gpi 2 0 concentration of 
approximately 125 ng/ml for the synthetic gpi 20 gene, and 
5 less than the background cutoff (5 ng/ml) for all the 
native gpl20 genes. Thus, expression of the synthetic 
gpl20 gene appears to be at least one order of magnitude 
higher than wildtype gpi 20 genes, in the experiment 
shown the increase was at least 25 fold. 
10 The Role of rev in a p \20 BmrM^jn^ 

Since rev appears to exert its effect at several 
steps in the expression of a viral transcript, the 
possible role of non-translational effects in the 
improved expression of the synthetic gpl 2 o gene was 
15 tested. First, to rule out the possibility that negative 
signals elements conferring either increased mRNA 
degradation or nucleic retention were eliminated by 
changing the nucleotide sequence, cytoplasmic mRNA levels 
were tested. Cytoplasmic RNA was prepared by NP40 lysis 
20 of transiently transacted 293T cells and subsequent 

elimination of the nuclei by centrif ugation . Cytoplasmic 
RNA was subsequently prepared from lysates by multiple 
phenol extractions and precipitation, spotted on 
nitrocellulose using a slot blot apparatus, and finally 
25 hybridized with an envelope-specific probe. 

Briefly, cytoplasmic mRNA 293 cells transfected 
with CDMS, gpl20 IIIB, or syngpl20 was isolated 36 hours 
post transfection. Cytoplasmic RNA of Hela cells 
infected with wildtype vaccinia virus or recombinant 
30 virus expressing gpl20 Illb or the synthetic gpl20 gene 
was under the control of the 7.5 promoter was isolated 16 
hours post infection. Equal amounts were spotted on 
nitrocellulose using a slot blot device and hybridized 
with randomly labelled 1.5 kb gpl20IIIb and syngpl20 
35 fragments or human beta-actin. RNA expression levels 



WO 96/09378 



pcT/us95/nsn 



were quantitated by scanning the hybridized membranes 
with a phospoimager . The procedures used are described 
in greater detail below. 

This experiment demonstrated that there was no 
5 significant difference in the mRNA levels of cells 
transfected with either the native or synthetic gpl20 
gene. In fact, in some experiments cytoplasmic mRNA 
level of the synthetic gpl20 gene was even lower than 
that of the native gpl20 gene. 

10 These data were confirmed by measuring expression 

from recombinant vaccinia viruses. Human 293 cells or 
Hela cells were infected with vaccinia virus expressing 
wildtype gpl20 Illb or syngpl20mn at a multiplicity of 
infection of at least 10. Supernatants were harvested 24 

15 hours post infection and immunoprecipitated with 

CD4 : immunoglobin fusion protein and protein A sepharose. 
The procedures used in this experiment are described in 
greater detail below. 

This experiment showed that the increased 

20 expression of the synthetic gene was still observed when 
the endogenous gene product and the synthetic gene 
product were expressed from vaccinia virus recombinants 
under the control of the strong mixed early and late 7.5k 
promoter. Because vaccinia virus mRNAs are transcribed 

25 and translated in the cytoplasm, increased expression of 
the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This 
experiment was repeated in two additional human cell 
types, the kidney cancer cell line 293 and HeLa cells. 

30 As with transfected 293T cells, mRNA levels were similar 
in 293 cells infected with either recombinant vaccinia 
virus. 



WO 96/09378 



PCT/US95/11511 



- 15 - 

Codon im ?ar j n Lentivtm. 

Because it appears that codon usage has a 
significant impact on expression in mammalian cells, the 
codon frequency in the envelope genes of other 
5 retroviruses was examined. This study found no clear 
pattern of codon preference between retroviruses in 
general. However, if viruses from the lentivirus genus 
to which HIV-i belongs to, were analyzed separately, 
codon usage bias almost identical to that of HIV-i was 
10 found. A codon frequency table from the envelope 
glycoproteins of a variety of (predominantly type C) 
retroviruses excluding the Antiviruses was prepared, and 
compared a codon frequency table created from the 
envelope sequences of four Antiviruses not closely 
15 related to HIV-i (caprine arthritis encephalitis virus 
equine infectious anemia virus, feline immunodeficiency 
virus, and visna virus) (Table 2). The codon usage 
pattern for Antiviruses is strikingly similar to that of 
HIV-i, in all cases but one, the preferred codon for 
20 HIV-i is the same as the preferred codon for the other 
Antiviruses. The exception is proline, which is encoded 
by CCT in 41% of non-HIV Antiviral envelope residues 
and by CCA in 40% of residues, a situation which clearly 
also reflects a significant preference for the triplet 
25 ending in A. The pattern of codon usage by the non- 
Antiviral envelope proteins does not show a similar 
predominance of A residues, and is also not as skewed 
toward third position C and G residues as is the codon 
usage for the highly expressed human genes. In general 
30 non-Antiviral retroviruses appear to exploit the 

different codons more equally, a pattern they share with 
less highly expressed human genes. 



WO 96/09378 



PCT/US95/11511 



- 16 - 

TABLE 2: Codon frequency in the envelope gene of 
lentiviruses (lenti) and non-lentiviral 
retroviruses (other) . 

Other Lenti other Lenti 

Ala 

GC 



10 

CG 



15 

AG 



Asn 

20 AA 



Asp 

GA 



25 



Leu 

3 0 CT 



TT 

35 



AA 



40 Pro 

cc 



c 


45 


13 


T 


26 


37 


A 


20 


46 


G 


9 


3 


C 


14 


2 


T 


6 


3 


A 


16 


5 


G 


17 


3 


A 


31 


51 


G 


15 


26 


C 


49 


31 


T 


51 


69 


C 


55 


33 


T 


51 


69 



C 


22 


8 


T 


14 


9 


A 


21 


16 


G 


19 


11 


A 


15 


41 


G 


10 


16 


A 


60 


63 


G 


40 


37 


C 


42 


14 


T 


30 


41 


A 


20 


40 


G 


7 


5 









? 1 




T 


47 


79 


ill n 








CA 










(j 






GlU 








uA 


/\ 




68 




G 


4 3 


J £ 


GlY 








GG 


C 


21 


8 




T 


13 


9 




A 


37 


56 




G 


29 


26 


Ii2 








CA 


C 


51 


38 




T 


49 


62 


111 








AT 


C 


3 8 


1 6 




T 


31 


22 




A 


31 


61 


8SX 








TC 


C 


38 


10 




T 


17 


16 




A 


18 


24 




G 


6 


5 


AG 


C 


13 


20 




T 


7 


25 


Thr 








AC 


C 


44 


18 




T 


27 


20 




A 


19 


55 




G 


10 


8 


Tvr 








TA 


C 


48 


28 




T 


52 


72 
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™* S 52 25 GT 1 C 36 9 

T 48 75 T 17 10 

5 A 22 54 

G 25 27 

S?SiSJI2!J - K Cy ^? a ". C ? lculat,d usin * the GCG Program 

established by the University of Wisconsin Genetics 

10 2S?S t,r Grou P- 1 Nunbe " represent the percent in 
10 which a particular codon is used. Codon usage of non- 
lentiviral retroviruses was compiled from thl e^e^e 
precursor sequences of bovine leukemia virus feline 
i-cin 1 ?^ 1 ^' hunan .T-cell leukemia virus type J? human 
15 1 inotropic virus type II, the mink cell focus- 

15 forming isolate of murine leukemia virus (MuLV) the 

?;e S ^L SPle r/ 0CUS - f0rDing isolate ' the ^A isolate, 
the 4 070A amphotropic isolate and the myeloproliferative 

Jiiftn" VlrUS isolate ' ^o» rat leukemia virus? 
20 ?»nv a 2 sarcona virus, simian T-cell leukemia virus, 
20 Isogenic retrovirus T1223/B and gibbon ape leukemia 

?; n ,.^?,. COd0n fre( I uenc y tables for the non-HIV, non- 
SIV Antiviruses were compiled from the envelope 

ST^«" 0r sequences for caprine arthritis encephalitis 
virus equine infectious anemia virus, feline 
5 immunodeficiency virus, and visna virus. 

In addition to the prevalence of A containing 
codons, lentiviral codons adhere to the HIV pattern of 
strong CpG underrepresentation, so that the third 

0 position for alanine, proline, serine and threonine 
triplets is rarely G. The retroviral envelope triplets 
show a similar, but less pronounced, underrepresentation 
of CpG. The most obvious difference between lentiviruses 
and other retroviruses- with respect to CpG prevalence 

5 lies in the usage of the CGX variant of arginine 
triplets, which is reasonably frequently represented 
among the retroviral envelope coding sequences, but is 
almost never present among the comparable lentivirus 
sequences . 

Pifferences j n rev Depend e nce Between Native 
Synthetic ~ 

To examine whether regulation by rev is connected 
to HIV-i codon usage, the influence of rev on the 
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expression of both native and synthetic gene was 
investigated. Since regulation by rev requires the rev- 
binding site RRE in cis, constructs were made in which 
this binding site was cloned into the 3' untranslated 
5 region of both the native and the synthetic gene. These 
plasmids were co-transf ected with rev or a control 
plasmid in trans into 293T cells, and g P 120 expression 
levels in supernatants were measured semiquantitative^ 
by immunoprecipitation. The procedures used in this 
10 experiment are described in greater detail below. 

As shown in FIG. 5, panels A and B, rev 
upregulates the native g P 120 gene, but has no effect on 
the expression of the synthetic gpl20 gene. Thus, the 
action of rev is not apparent on a substrate which lacks 
15 the coding sequence of endogenous viral envelope 
sequences . 

nrrrr-- i-n -f ■ thy-i w with hiy 

envelop **- codons 

The above-described experiment suggest that in 

20 fact "envelope sequences" have to be present for rev 
regulation. In order to test this hypothesis, a 
synthetic version of the gene encoding the small, 
typically highly expressed cell surface protein, rat 
THV-l antigen, was prepared. The synthetic version of 

25 the rat THY-1 gene was designed to have a codon usage 

like that of HIV gpl20. In designing this synthetic gene 
AUUUA sequences, which are associated with mRNA 

. , , T „ aHHit-ion two restriction 

instability, were avoided. In addition, 

sites were introduced to simplify manipulation of the 

(TIG 6) This synthetic gene with the 
30 resulting gene (tii>. °> • 

„ , rTH v-ianv) was qenerated using 

HIV envelope codon usage ( rTHi lenvj was y 

three 150 to 170 mer oligonucleotides (FIG. 7). In 
contrast to the syng P 120mn gene, PGR products were 
directly cloned and assembled in P UC12, and subsequently 
3 5 cloned into pCDM7 . 
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Expression levels of native rTHY-1 and rTHY-1 with 
the HIV envelope codons were quantitated by 
immunofluorescence of transiently transfected 293T cells. 
FIG 8 shows that the expression of the native THY-l gene 
5 is almost two orders of magnitude above the background 
level of the control transfected cells (pCDM7) . m 
contrast, expression of the synthetic rat THY-l is 
substantially lower than that of the native gene (shown 
by the shift to of the peak towards a lower channel 
10 number) . 

To prove that no negative sequence elements 

promoting mRNA degradation were inadvertently introduced, 

a construct was generated in which the rTHY-lenv gene was 

cloned at the 3' end of the synthetic gp!20 gene (FIG, 9, 

15 panel B) . in this experiment 293T cells were transfected 

with either the syngpl20mn gene or the syngpl20/rat THY-l 

env fusion gene (syngpl20mn. rTHY-lenv) . Expression was 

measured by immunoprecipitation with CD4:IgG fusion 

protein and protein A agarose. The procedures used in 

20 this experiment are described in greater detail below. 

Since the synthetic gpl20 gene has an UAG stop 

codon, rTHY-lenv is not translated from this transcript. 

If negative elements conferring enhanced degradation were 

present in the sequence, gpl20 protein levels expressed 

5 from this construct should be decreased in comparison to 

the syngpl20mn construct without rTHY-lenv. FIG. 9, 

panel A, shows that the expression of both constructs is 

similar, indicating that the low expression must be 

linked to translation. 

0 Rev-dependent expressio n of synthetic rat THY- l 

gene witfr p nvelope codons 

To explore whether rev is able to regulate 
expression of a rat THY-l gene having env codons, a 
construct was made with a rev-binding site in the 3' end 
5 of the rTHYlenv open reading frame. To measure rev- 
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responsiveness of the a rat THY-lenv construct having a 
3' RRE, human 293T cells were cotransf ected 
ratTHY-lenvrre and either CDM7 or pCMVrev. At 60 hours 
post transfection cells were detached with 1 mM EDTA in 
5 PBS and stained with the OX-7 anti rTHY-1 mouse 
monoclonal antibody and a secondary FITC-con jugated 
antibody. Fluorescence intensity was measured using a 
EPICS XL cytof luorometer. These procedures are described 
in greater detail below, 

10 m repeated experiments, a slight increase of 

rTHY-lenv expression was detected if rev was 
cotransf ected with the rTHY-lenv gene. To further 
increase the sensitivity of the assay system a construct 
expressing a secreted version of rTHY-lenv was generated. 

15 This construct should produce more reliable data because 
the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production 
over an extended period, in contrast to surface expressed 
protein, which appears to more closely reflect the 

20 current production rate. A gene capable of expressing a 
secreted form was prepared by PCR using forward and 
reverse primers annealing 3' of the endogenous leader 
sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The 

2 5 PCR product was cloned into a plasmid which already 

contained a CDS leader sequence, thus generating a 
construct in which the membrane anchor has been deleted 
and the leader sequence exchanged by a heterologous (and 
probably more efficient) leader peptide. 
30 the rev-responsiveness of the secreted form 

ratTHY-lenv was measured by iirununoprecipitation of 
supernatants of human 293T cells cotransf ected with a 
plasmid expressing a secreted form of ratTHY-lenv and the 
RRE sequence in cis ( rTHY-ienvPI-rre ) and either CDM7 or 

3 5 pCMVrev. The rTHY-lenvPI-RRE construct was made by PCR 
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using the oligonucleotides 

cgcggggctagcgcaaagagtaataagtttaac as forward and 
cgcggatcccttgtattttgtactaata a as reverse primers and the 
synthetic rTHY-lenv construct as template. After 
5 digestion with Nhel and Notl the PCR fragment was cloned 
into a plasmid containing CDS leader and RRE sequences. 
Supernatants of 35 S labelled cells were harvested 72 
hours post transf ection, precipitated with a mouse 
monoclonal antibody 0X7 against rTHY-1 and anti mouse IgG 

10 sepharose, and run on a 12% reducing SDS-PAGE. 

In this experiment the induction of rTHY-lenv by 
rev was much more prominent and clearcut than in the 
above-described experiment and strongly suggests that rev 
is able to translationally regulate transcripts that are 

15 suppressed by low-usage codons. 

Rev-independent expression of a rTHY-lenv: immunoglobulin 

fusipn protein 

To test whether low-usage codons must be present 
throughout the whole coding sequence or whether a short 

20 region is sufficient to confer rev-responsiveness, a 
rTHY-lenv: immunoglobulin fusion protein was generated. 
In this construct the rTHY-lenv gene (without the 
sequence motif responsible for phosphatidyl inositol 
glycan anchorage) is linked to the human IgGl hinge, CH2 

2 5 and CH3 domains. This construct was generated by anchor 
PCR using primers with Nhel and BamHI restriction sites 
and rTHY-lenv as template. The PCR fragment was cloned 
into a plasmid containing the leader sequence of the CDS 
surface molecule and the hinge, CH2 and CH3 parts of 

30 human IgGl immunoglobulin. A Hind3/Eagl fragment 
containing the rTHY-lenvegl insert was subsequently 
cloned into a pCDM7 -derived plasmid with the RRE 
sequence . 

To measure the response of the rTHY-lenv/ 
35 immunoglobin fusion gene (rTHY-lenveglrre) to rev human 
293T cells cotransf ected with rTHY-lenveglrre and either 
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pCDM7 or pCMVrev. The rTHY-lenveglrre construct was made 
by anchor PCR using forward and reverse primers with Nhel 
and BamHl restriction sites respectively. The PCR 
fragment was cloned into a plasmid containing a CDS 
5 leader and human IgGl hinge, CH2 and CH3 domains • 

Supernatants of 35 S labelled cells were harvested 72 hours 
post transf ection, precipitated with a mouse monoclonal 
antibody 0X7 against rTHY-1 and anti mouse IgG sepharose, 
and run on a 121 reducing SDS-PAGE. The procedures used 

10 are described in greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/immunoglobulin fusion protein is secreted into 
the supernatant. Thus, this gene should be responsive to 
rev-induction. However, in contrast to rTHY-lenvPI-, 

15 cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-1 : immunoglobulin fusion 
protein with native rTHY-1 or HIV envelope codons was 
measured by immunoprecipitation . Briefly, human 293T 

20 cells transfected with either rTHY-lenvegl (env codons) 
or rTHY-lwtegl (native codons). The rTHY-lwtegl 
construct was generated in manner similar to that used 
for the rTHY-lenvegl construct, with the exception that a 
plasmid containing the native rTHY-1 gene was used as 

25 template. Supernatants of 35 S labelled cells were 

harvested 72 hours post transf ection, precipitated with a 
mouse monoclonal antibody 0X7 against rTHY-1 and anti 
mouse IgG sepharose, and run on a 12% reducing SDS-PAGE. 
THe procedures used in this experiment are described in 

30 greater detail below. 

Expression levels of rTHY-lenvegl were decreased 
in comparison to a similar construct with wildtype rTHY-1 
as the fusion partner, but were still considerably higher 
than rTHY-lenv. Accordingly, both parts of the fusion 

35 protein influenced expression levels. The addition of 
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rTHY-lenv did not restrict expression to an equal level 
as seen for rTHY-lenv alone. Thus, regulation by rev 
appears to be ineffective if protein expression is not 
almost completely suppressed. 

5 gPdPn preference \ n H tv-i ^inp. ^„ n r _ 

Direct comparison between codon usage frequency of 
HIV envelope and highly expressed human genes reveals a 
striking difference for all twenty amino acids. One 
simple measure of the statistical significance of this 
10 codon preference is the finding that among the nine amino 
acids with two fold codon degeneracy, the favored third 
residue is A or U in all nine. The probability that all 
nine of two equiprobable choices will be the same is 
approximately 0.004, and hence by any conventional 
15 measure the third residue choice cannot be considered 

random. Further evidence of a skewed codon preference is 
found among the more degenerate codons, where a strong 
selection for triplets bearing adenine can be seen. This 
contrasts with the pattern for highly expressed genes, 
20 which favor codons bearing c, or less commonly G , in the 
third position of codons with three or more fold 
degeneracy. 

The systematic exchange of native codons with 
codons of highly expressed human genes dramatically 
25 increased expression of g P i20. A quantitative analysis 
by ELISA showed that expression of the synthetic gene was 
at least 25 fold higher in comparison to native gpi 2 0 
after transient transfection into human 293 cells. The 
concentration levels in the ELISA experiment shown were 
30 rather low. since an ELISA was used for quantification 
which is based on gpl20 binding to CD4 , only native, non- 
denatured material was detected. This may explain the 
apparent low expression. Measurement of cytoplasmic mRNA 
levels demonstrated that the difference in protein 



WO 96/09378 



PCT7US95/115n 



- 24 - 

expression is due to translational differences and not 

mRNA stability. 

Retroviruses in general do not show a similar 
preference towards A and T as found for HIV. But if this 
5 family was divided into two subgroups, lentiviruses and 
non-lentiviral retroviruses, a similar preference to A 
and, less frequently, T, was detected at the third codon 
position for lentiviruses. Thus, the availing evidence 
suggests that lentiviruses retain a characteristic 
10 pattern of envelope codons not because of an inherent 
advantage to the reverse transcription or replication of 
such residues, but rather for some reason peculiar to the 
physiology of that class of viruses. The major 
difference between lentiviruses and non-complex 
15 retroviruses are additional regulatory and non- 

essentially accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the 
restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional 
20 molecules is based on it. In fact, it is known that one 
of these proteins, rev, which most likely has homologues 
in all lentiviruses. Thus codon usage in viral mRNA is 
used to create a class of transcripts which is 
susceptible to the stimulatory action of rev. This 
25 hypothesis was proved using a similar strategy as above, 
but this time codon usage was changed into the inverse 
direction. Codon usage of a highly expressed cellular 
gene was substituted with the most frequently used codons 
in the HIV envelope. As assumed, expression levels were 
30 considerably lower in comparison to the native molecule, 
almost two orders of magnitude when analyzed by 
immunofluorescence of the surface expressed molecule (see 
4.7). If rev was coexpressed in trans and a RRE element 
was present in cis only a slight induction was found for 
35 the surface molecule. However, if THY-1 was expressed as 
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a secreted molecule, the induction by rev was much more 
prominent, supporting the above hypothesis. This can 
probably be explained by accumulation of secreted protein 
in the supernatant, which considerably amplifies the rev 
5 effect. if rev only induces a minor increase for surface 
molecules in general, induction of HIV envelope by rev 
cannot have the purpose of an increased surface 
abundance, but rather of an increased intracellular gpifio 
level, it is completely unclear at the moment why this 
10 should be the case. 

To test whether small subtotal elements of a gene 
are sufficient to restrict expression and render it rev- 
dependent rTHYlenv: immunoglobulin fusion proteins were 
generated, in which only about one third of the total 
15 gene had the envelope codon usage. Expression levels of 
this construct were on an intermediate level, indicating 
that the rTHY-lenv negative sequence element is not 
dominant over the immunoglobulin part. This fusion 
protein was not or only slightly rev-responsive, 
20 indicating that only genes almost completely suppressed 
can be rev-responsive. 

Another characteristic feature that was found in 
the codon frequency tables is a striking 
underrepresentation of CpG triplets. In a comparative 
25 study of codon usage in E. coli, yeast, drosophila and 
primates it was shown that in a high number of analyzed 
primate genes the 8 least used codons contain all codons 
with the CpG dinucleotide sequence. Avoidance of codons 
containing this dinucleotide motif was also found in the 
0 sequence of other retroviruses. it seems plausible that 
the reason for underrepresentation of CpG-bearing 
triplets has something to do with avoidance of gene 
silencing by methylation of CpG cytosines. The expected 
number of CpG dinucleotides for HIV as a whole is about 
5 one fifth that expected on the basis of the base 
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composition. This might indicate that the possibility of 
high expression is restored, and that the gene in fact 
has to be highly expressed at some point during viral 
pathogenesis . 

5 The results presented herein clearly indicate that 

codon preference has a severe effect on protein levels, 
and suggest that translational elongation is controlling 
mammalian gene expression. However, other factors may 
play ar role. First, abundance of not maximally loaded 

10 mRNA's in eukaryotic cells indicates that initiation is 
rate limiting for translation in at least some cases, 
since otherwise all transcripts would be completely 
covered by ribosomes. Furthermore, if ribosome stalling 
and subsequent mRNA degradation were the mechanism, 

15 suppression by rare codons could most likely not be 
reversed by any regulatory mechanism like the one 
presented herein. One possible explanation for the 
influence of both initiation and elongation on 
translational activity is that the rate of initiation, or 

20 access to ribosomes, is controlled in part by cues 

distributed throughout the RNA, such that the lentiviral 
codons predispose the RNA to accumulate in a pool of 
poorly initiated RNAs. However, this limitation need not 
be kinetic; for example, the choice of codons could 

25 influence the probability that a given translation 

product, once initiated, is properly completed. Under 
this mechanism, abundance of less favored codons would 
incur a significant cumulative probability of failure to 
complete the nascent polypeptide chain. The sequestered 

30 RNA would then be lent an improved rate of initiation by 
the action of rev. Since adenine residues are abundant 
in rev-responsive transcripts, it could be that RNA 
adenine methylation mediates this translational 
suppression. 
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Detail^ p r? ^ r1lirrr> 

The following procedures were used in the above- 
described experiments. 
Secmpnr-o ftnf|1vr . ir . 

5 Sequence analyses employed the software developed 

by the University of Wisconsin Computer Group. 
Plasmid rn n c truct< nn c 

Plasmid constructions employed the following 
methods, vectors and insert DNA was digested at a 
10 concentration of 0.5 Mg /io M l in the appropriate 

restriction buffer for l - 4 hours (total reaction volume 
approximately 30 M l) . Digested vector was treated with 
10% (v/v) of l w/ml calf intestine alkaline phosphatase 
for 30 min prior to gel electrophoresis. Both vector and 
15 insert digests (5 to 10 M i each) were run on a 1.5% low 
melting agarose gel with TAE buffer. Gel slices 
containing bands of interest were transferred into a l 5 
ml reaction tube, melted at 65-C and directly added to 
the ligation without removal of the agarose. Ligations 
20 were typically done in a total volume of 25 M i in ix Low 
Buffer ix Ligation Additions with 200-400 U of ligase, 1 
Ml of vector, and 4 M l of insert. When necessary, 5'' 
overhanging ends were filled by adding 1/10 volume of 250 
MM dNTPs and 2-5 U of Klenow polymerase to heat 
25 inactivated or phenol extracted digests and incubating 
for approximately 20 min at room temperature. When 
necessary, 3' overhanging ends were filled by adding 1/10 
volume of 2.5 mM dNTPs and 5-10 U of T4 DNA polymerase to 
heat inactivated or phenol extracted digests, followed by 
30 incubation at 37-C for 30 min. The following buffers 

were used in these reactions: lOx Low buffer (60 mM Tris 
HC1, p H 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM 
0-mercaptoethanol, 0.02* NaN 3 ) ; lOx Medium buffer (60 mM 
Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 
35 70 mM 0-mercaptoethanol, 0.02% NaN 3 ) ; lOx High buffer (60 
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mM Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/al 
BSA, 70 mM 0-mercaptoethanol , 0.02% NaN 3 ) ; lOx Ligation 
additions (1 mM ATP , 20 mM DTT, 1 mg/ml BSA, 10 mM 
spermidine); 50x TAE (2 M Tris acetate, 50 mM EDTA) . 
5 Oligonucl eotide synthesis and purification 

Oligonucleotides were produced on a Milligen 8750 
synthesizer (Millipore) . The columns were eluted with 1 
ml of 30% ammonium hydroxide, and the eluted 
oligonucleotides were deblocked at 55 °C for 6 to 12 

10 hours. After deblockiong, 150 fil of oligonucleotide were 
precipitated with lOx volume of unsaturated n-butanol in 
1.5 ml reaction tubes, followed by centrif ugation at 
15,000 rpm in a microfuge. The pellet was washed with 
70% ethanol and resuspended in 50 Ml of H 2 0. The 

15 concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:333 (1 OD 260 = 30 
Mg/ml) . 

The following oligonucleotides were used for 
construction of the synthetic gpl20 gene (all sequences 
20 shown in this text are in 5' to 3' direction). 

oligo 1 forward (Nhel) : cgc ggg eta gec acc gag 
aag ctg (SEQ ID NO: 1) . 

oligo 1: acc gag aag ctg tgg gtg acc gtg tac tac 
ggc gtg ccc gtg tgg aag ag ag gec acc acc acc ctg ttc tgc 
25 gec age gac gec aag gcg tac gac acc gag gtg cac aac gtg 
tgg gee acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag 
gag gtg gag etc gtg aacgtg acc gag aac ttc aac atg (SEQ 
ID NO: 2) . 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt 
30 tga agt tct c (SEQ ID NO: 3). 

oligo 2 forward: gac cga gaa ctt caa cat gtg gaa 
gaa caa cat (SEQ ID NO: 4) 

oligo 2: tgg aag aac aac atg gtg gag cag atg cat 
gag gac ate ate age ctg tgg gac cag age ctg aag ccc tgc 
3 5 gtg aag ctg acc cc ctg tgc gtg acc tg aac tgc acc gac ctg 
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a,c aac agc gag ggc acc atc aag ^ ^ ^ ^ ^ ^ 

5 cat , OU9 ° 2 CeVerSe (PStl,: gtt get gca gtt ctt 

5 cat etc gee gee ctt (SEQ ID NO: 6 ) . 

oligo 3 forward (P Stl) . gaa gaa 
cat cac cac cag c (SEQ id NO: 7) . 

oligo 3: aac atc acc acc agc atc cgc gac aao ate 
« atc gac aac gac agc acc * J g 

gag ccc atc ccc atc cac tar- 
C«0 10 »0, ., . 

oligo 3 reverse: gaa ctt ctt a+n „„„ 
" «c ggg (SEQ ID NO: 9) . ? " 9 " 9CC 

to. . * 0li9 ° 4 f ° rWard: 9C9 CCC CCg CC * * Ct cea tec 

tga agt gea aeg aea aga agt te (SEQ ID NO: 10) 

oligo 4: gcc gac aag aag 

^ a9C aCC ct <= ctg etg aac ggc age ctg gel 

«-* 9 ag gag gtg gtg atc cgc agc gag J 9cc 

;:; aa : ;rr atc gtg cac ct * - - - - - - 

» - - - U^SE^NO^r ^ " ^ C ^ 
oligo 5 forward (Mlul) : gag agc gtg cag atc aac 
tgc acg cgt ccc (SEQ ID NO: 13). 

oligo 5: aac tgc acg cgt ccc aac tae aac aag cgc 
ag cgc atc cac ate ggc ccc ggg C gc gee ttc tac acc acc 
30 aag aac atC atc ggc acc atc etc cag gee cac tgc aac atc 
tct aga (SEQ ID NO: 14) 

oligo 5 reverse: atc att cca r-t-i- - ► 

y gtt cca ctt ggc tct aga gat 

gtt g Ca (SEQ ID NO: 15) . 

oligo 6 forward: gca aca tct eta gag C c a agt gga 
35 acg ac (SEQ ID NO: 16). 99 
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oligo 6: gcc aag tgg aac gac acc ctg cgc cag ate 
gtg age aag ctg aag gag cag ttc aag aac aag acc ate gtg 
ttc ac cag age age ggc ggc gac ccc gag ate gtg atg cae 
age ttc aac tgc ggc ggc (SEQ ID NO: 17). 
5 oligo 6 reverse (EcoRl) : gca gta gaa gaa ttc gcc 

gcc gca gtt ga (SEQ ID NO: 18). 

oligo 7 forward (EcoRl) : tea act gcg gcg gcg aat 

tct tct act gc (SEQ ID NO: 19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc 
10 ctg ttc aac age acc tgg aac ggc aac aac acc tgg aac aac 
acc acc ggc age aac aac aat att acc etc cag tgc aag ate 
aag cag ate ate aac atg tgg cag gag gtg ggc aag gcc atg 
tac gcc ccc ccc ate gag ggc cag ate egg tgc age age (SEQ 
ID NO: 20) 

15 oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 

acc gga tct ggc cct c (SEQ ID NO: 21). 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag 
caa cat cae egg tct g (SEQ ID NO: 22). 

oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac 
2 0 ggc ggc aag gac acc gac acc aac gac acc gaa ate ttc cgc 
ccc ggc ggc ggc gac atg cgc gac aac tgg aga tct gag ctg 
tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gee 
ccc acc aag gee aag cgc cgc gtg gtg cag cgc gag aag cgc 

(SEQ ID NO: 23) . 
25 oligo 8 reverse (Notl) : cgc ggg egg ccg ctt tag 

cgc ttc teg cgc tgc acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 

oligo 1 forward (BamHl/Hind3 ) : cgc ggg gga tec 
30 aag ctt acc atg att cca gta ata agt (SEQ ID NO: 25, 

oligo l: atg aat cca gta ata agt ata aca tta tta 
tta agt gta tta caa atg agt aga gga caa aga gta ata agt 
tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga cat gaa aat aat aca aat tt, cca ata caa cat gaa ttt 
35 tea tta acg (SEQ ID NO: 26). 
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oligo 1 reverse (EcoRi/Mlul) : cgc ggg gaa ttc acg 
cgt taa tga aaa ttc atg ttg (SEQ ID NO: 27) . 

oligo 2 forward (BamHl/Mlul) : cgc gga tec acg cgt 
gaa aaa aaa aaa cat (SEQ ID NO: 28). 
5 oligo 2: cgt gaa aaa aaa aaa cat gta tta agt gga 

aca tta gga gta cca gaa cat aca tat aga agt aga gta aat 
ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 
ttt aca aca aaa gat gaa gga gat tat atg tgt gag (SEQ ID 
NO: 29). 

10 oligo 2 reverse (EcoRl/Sacl) : cgc gaa ttc gag etc 

aca cat ata ate tec (SEQ ID NO: 30). 

oligo 3 forward (BamHl/Sacl) : cgc gga tec gag etc 
aga gta agt gga caa (SEQ ID NO: 31). 

oligo 3: etc aga gta agt gga caa aat cca aca agt 
15 agt aat aaa aca ata aat gta ata aga gat aaa tta gta aaa 
tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 
tta tta tta tta tta agt tta agt ttt tta caa gca aca gat 
ttt ata agt tta tga (SEQ ID NO: 32). 

oligo 3 reverse (EcoRl/Notl) : cgc gaa ttc gcg gcc 
20 get tea taa act tat aaa ate (SEQ ID NO: 33). 
Polymers Re^^p 

Short, overlapping 15 to 25 mer oligonucleotides 
annealing at both ends were used to amplify the long 
oligonuclotides by polymerase chain reaction (PCR) . 
25 Typical PCR conditions were: 35 cycles, 55«C annealing 
temperature, 0.2 sec extension time. PCR products were 
gel purified, phenol extracted, and used in a subsequent 
PCR to generate longer fragments consisting of two 
adjacent small fragments. These longer fragments were 
0 cloned into a CDM7-derived plasmid containing a leader 
sequence of the CDS surface molecule followed by a 
Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 

The following solutions were used in these 
reactions: lox PCR buffer (500 mM KC1, 100 mM Tris HC1, 
5 pH 7.5, 8 mM MgCl 2 , 2 mM each dNTP) . The final buffer 
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was complemented with 101 DMSO to increase fidelity of 

the Taq polymerase. 

ffflfll 1 ggale dna preparation 

Transformed bacteria were grown in 3 ml LB 
5 cultures for more than 6 hours or overnight. 

Approximately 1.5 ml of each culture was poured into 1.5 
ml microfuge tubes, spun for 20 seconds to pellet cells 
and resuspended in 200 fil of solution I. Subsequently 
400 Ml ot solution II and 300 fil of solution III were 
10 added. The microfuge tubes were capped, mixed and spun 
for > 30 sec. Supernatants were transferred into fresh 
tubes and phenol extracted once. DNA was precipitated by 
filling the tubes with isopropanol, mixing, and spinning 
in a microfuge for > 2 min. The pellets were rinsed in 
15 70 % ethanol and resuspended in 50 pi dH20 containing 10 
Hi of RKAse A. The following media and solutions were 
used in these procedures: LB medium (1.0 % NaCl, 0.5% 
yeast extract, 1.0% trypton) ; solution I (10 mM EDTA pH 
8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution III 
20 (2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 
adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HC1, 
pH 7.5, 1 mM EDTA pH 8.0). 
f, ? r ge scale D Hft preparation 

One liter cultures of transformed bacteria were 
25 grown 24 to 36 hours (MC1061p3 transformed with pCDM 

derivatives) or 12 to 16 hours (MC1061 transformed with 
pUC derivatives) at 37'C in either M9 bacterial medium 
(pCDM derivatives) or LB (pUC derivatives) . Bacteria 
were spun down in 1 liter bottles using a Beckman J6 
30 centrifuge at 4,200 rpm for 20 min. The pellet was 

resuspended in 40 ml of solution I. Subsequently, 80 ml 
of solution II and 40 ml of solution III were added and 
the bottles were shaken semivigorously until lumps of 2 
to 3 mm size developed. The bottle was spun at 4,200 rpm 
35 for 5 min and the supernatant was poured through 
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cheesecloth into a 250 ml bottle. Isopropanol was added 
to the top and the bottle was spun at 4,200 rpm for 10 
min. The pellet was resuspended in 4.1 ml of solution I 
and added to 4.5 g of cesium chloride, 0.3 ml of 10 mg/ml 
5 ethidium bromide, and 0.1 ml of l% Triton X100 solution. 
The tubes were spun in a Beckman J2 high speed centrifuge 
at 10,000 rpm for 5 min. The supernatant was transferred 
into Beckman Quick Seal ultracentrif uge tubes, which were 
then sealed and spun in a Beckman ultracentrifuge using a 
10 NVT90 fixed angle rotor at 80,000 rpm for > 2.5 hours. 
The band was extracted by visible light using a 1 ml 
syringe and 2 0 gauge needle. An equal volume of dH 2 0 was 
added to the extracted material. DNA was extracted once 
with n-butanol saturated with 1 M sodium chloride, 
15 followed by addition of an equal volume of 10 M ammonium 
acetate/ 1 mM EDTA. The material was poured into a 13 ml 
snap tube which was tehn filled to the top with absolute 
ethanol, mixed, and spun in a Beckman J2 centrifuge at 
10,000 rpm for 10 min. The pellet was rinsed with 70* 
20 ethanol and resuspended in 0.5 to 1 ml of H 2 0. The DNA 
concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:200 (1 OD 260 = 50 
Mg/ml) . 

The following media and buffers were used in these 
25 procedures: M9 bacterial medium (10 g M9 salts, 10 g 
casamino acids (hydrolysed) , 10 ml M9 additions, 7.5 
Mg/ml tetracycline (500 fxl of a 15 mg/ml stock solution), 
12.5 Mg/ml ampicillin (125 fil of a 10 mg/ml stock 
solution); M9 additions (10 mM CaCl 2 , 100 mM MgS0 4 , 200 
30 Mg/ml thiamine, 70% glycerol); LB medium (1.0 % NaCl, 0.5 
% yeast extract, 1.0 % trypton) ; Solution I (10 mM EDTA 
pH 8.0); Solution II (0.2 M NaOH 1.0 % SDS) ; Solution III 
(2.5 M KOAc 2.5 M HOAc) 
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c ? rf M encinq 

Synthetic genes were sequenced by the Sanger 
dideoxynucleotide method. In brief, 20 to 50 Mg double- 
stranded plasmid DNA were denatured in 0.5 M NaOH for 5 
5 min. Subsequently the DNA was precipitated with 1/10 
volume of sodium acetate (pH 5.2) and 2 volumes of 
ethanol and centrifuged for 5 .in. The pellet was washed 
with 70% ethanol and resuspended at a concentration of 1 
fiq/fil . The annealing reaction was carried out with 4 m° 
10 of template DNA and 40 ng of primer in ix annealing 
buffer in a final volume of 10 M l. The reaction was 
neated to 65'C and slowly cooled to 37-C. In a separate 
tube 1 Ml of 0.1 M DTT, 2 M l of labeling mix, 0.75 M l of 
dH 2 0 1 Ml Of [ 35 S] dATP (10 uCi), and 0.25 Ml of 
15 Sequenas^ (12 U/MD were added for each reaction. Five 
M l of this ...ix were added to each annealed primer- 
template tube and incubated for 5 min at room 
temperature. For each labeling reaction 2.5 Ml of each 
of the 4 termination mixes were added on a TerasaJci plate 
20 and prewarmed at 37-C. At the end of the incubation 

period 3.5 Ml of labeling reaction were added to each of 
the 4 termination mixes. After 5 min, 4 Ml of stop 
S olution were added to each reaction and the Terasafci 
plate was incubated at 80-C for 10 min in an oven. The 
25 sequencing reactions were run on 5% denaturing 

polyacrylamide gel. An acrylamide solution was prepared 
by adding 200 ml of lOx THE buffer and 957 ml of dH 2 0 to 
100 g of acrylamide : bisacrylamide (29:1). 5% 
polyacrylamide 46% urea and ix TBE gel was prepared by 
30 combining 38 ml of acrylamide solution and 28 g «r... 

Polymerization was initiated by the addition of 400 m of 
10% ammonium peroxodisulf ate and 60 Ml of TEMED • G.1. 
were poured using silanized glass plates and shar* tooth 
combs and run in ix TBE buffer at 60 to 100 W for 2 to 4 
35 hours (depending on the region to be read). Gels were 



WO 96/09378 



PCT/US95/11511 



- 35 - 



transferred to Whatman blotting paper, dried at fl0 „ c 
about l hour, and exposed tQ x . ray ^ rooa 

temperature. Typically exposure time was 12 hours The 
no„ lng solutions w ere used i„ these procedures J 
5 Annealing buffer (200 mM Tris HC1 „h -r * 

r« " "° mm each dNTp - » * . 

M 7 . 2 ° mEDTA ' ; *>lr««yl«l«. solution 

RNA i?r>1ah^ n 

» nh k Cyt ° PlaSmiG was isolated from calcium 
Phosphate transfected 293T cells 36 hours post 
transection and from vaccinia infected Hela cells 16 ' 

(Gill" T inf6Cti0n eSSentia11 ' « ^-ribed by Gii man . 
(Gilman Preparation of cytoplasmic rna from tissue 

20 Bier" C6llS - In CU " ent Pr ° tOCOlS » »« 
"92, Briefly, cells were l ysed in 400 ^ lysis 

ToVZ o s 7 out ' and SDS and protelnase K — 

f ° °- 2 mg/Bl ««p.ctiv.i y . The cytoplasmic 

extracts were incubated at 37-c for 20 min 

" ^ A en ° 1/C H hl0r0f0 - twice, and precipitated. The 

RNA was dissolved in 100 Ml buffer I and incubated at 
37 c for 20 min. The reaction was stopped by adding 25 
Ml stop buffer and precipitated again. 

The following solutions were used in this 
30 procedure: Lysis Buffer (TE containing with 50 mM Tris pH 
8.0, 100 mM Kaci, 5 mM Mgci 2 , o.5% NP40) ; Buffer I (TE 
buffer with 10 mM MgCl 2 , 1 mM DTT, 0.5 u/ M l placental 

inhibitor, o.l u/mI RNAse free DNAse I); stop 
buffer (50 mM EDTA 1.5 M NaOAc 1.0 % SDS). 
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For slot blot analysis 10 M9 of cytoplasmic RNA 
was dissolved in 50 Ml *H 2 0 to which 150 Ml of lOx 
SSC/18% formaldehyde were added. The solubilized RKA was 
5 then incubated at 65 «C for 15 min and spotted onto with a 
slot blot apparatus. Radioactively labelled probes of 
1.5 kb gpl20IIIb and syngpl20mn fragments were used for 
hybridization. Each of the two fragments was random 
labelled in a 50 Ml reaction with 10 Ml of 5x oligo-^ 
10 labelling buffer, 8 Ml of 2.5 mg/ml BSA, 4 M l of -[ PJ- 
dCTP (20 uCiMl; 6000 Ci/mmol), and 5 U of Klenow 
fragment. After 1 to 3 hours incubation at 37»C 100 Ml 
of TE were added and unincorporated «[ 32 PJ-dCTP was 
eliminated using G50 spin column. Activity was measured 
15 in a Beckman beta-counter, and equal specific activities 
were used for hybridization. Membranes were pre- 
hybridized for 2 hours and hybridized for 12 to 24 hours 
at 42 -C with 0.5 x 10 6 cpm probe per ml hybridization 
fluid. The membrane was washed twice (5 min) with 
20 washing buffer I at room temperature, for one hour in 
washing buffer II at 65'C, and then exposed to x-ray 
film. Similar results were obtained using a 1.1 kb 
Notl/Sfil fragment of P CDM7 containing the 3 untranslated 
region. Control hybridizations were done in parallel 

2 5 with a random-labelled human beta-actin probe. RNA 

expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics 

phosphor imager . 

The following solutions were used in this 

3 0 procedure: 

5x Oligo-labelling buffer (250 mM Tris HC1, P H 8.0, 25 mM 
MgCl 2 , 5 mM rc.pto.th.nol. 2 mM dATP, 2mM dGTP, 2mM 

dTTP 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides (dNTP]6); 
Hybridization Solution (__ M sodium phosphate, 250 mM 
35 NaCl, 7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50* 
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formamide, 100 fig/ml denatured salmon sperm DNA) ; Washing 
buffer I (2x SSC, 

0.1% SDS) ; Washing buffer II (0.5x SSC, 0,1 % SDS) ; 20x 
SSC (3 M NaCl, 0.3 M Ka 3 citrate, pH adjusted to 7.0). 
5 Vaccinia recombinat jyr) 

Vaccinia recombination used a modification of the 
of the method described by Romeo and Seed (Romeo and 
Seed, C«ll, 64: 1037, 1991). Briefly, CVi cells at 70 to 
90% confluency were infected with l to 3 ^1 of a wildtype 
10 vaccinia stock WR (2 x 10 8 pfu/ml) for 1 hour in culture 
medium without calf serum. After 24 hours, the cells 
were transfected by calcium phosphate with 25 fig TKG 
plasmid DNA per dish. After an additional 24 to 48 hours 
the cells were scraped off the plate, spun down, and 
15 resuspended in a volume of 1 ml. After 3 freeze/ thaw 
cycles trypsin was added to 0.05 mg/ml and lysates were 
incubated for 20 min. A dilution series of 10, 1 and 0.1 
Ml of this lysate was used to infect small dishes (6 cm) 
of CVl cells, that had been pretreated with 12.5 fig/ml 
20 mycophenolic acid, 0.25 mg/ml xanthin and 1.36 mg/ml 

hypoxanthine for 6 hours. Infected cells were cultured 
for 2 to 3 days, and subsequently stained with the 
monoclonal antibody NEA9301 against gpl20 and an alkaline 
phosphatase conjugated secondary antibody. Cells were 
25 incubated with 0.33 mg/ml NBT and 0.16 mg/ml BCIP in AP- 
buffer and finally overlaid with 1% agarose in PBS. 
Positive plaques were picked and resuspended in 100 fil 
Tris pH 9.0. The plaque purification was repeated once. 
To produce high titer stocks the infection was slowly 
30 scaled up. Finally, one large plate of Hela cells was 
infected with half of the virus of the previous round. 
Infected cells were detached in 3 ml of PBS, lysed with a 
Dounce homogenizer and cleared from larger debris by 
centrifugation. VPE-8 recombinant vaccinia stocks were 
35 kindly provided by the AIDS repository, Rockville, MD, 
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and express HIV-1 IIIB gpl20 under the 7.5 mixed 
early/late promoter (Earl et al., J* Virol,, 65:31, 
1991) . In all experiments with recombinant vaccina cells 
were infected at a multiplicity of infection of at least 
5 10. 

The following solution was used in this procedure: 
AP buffer (100 mM Tris HC1, pH 9.5, 100 mM NaCl, 5 mM 
MgCl 2 ) 

Cell culture 

10 The monkey kidney carcinoma cell lines CV1 and 

Cos7, the human kidney carcinoma cell line 293T, and the 
human cervix carcinoma cell line Hela were obtained from 
the American Tissue Typing Collection and were maintained 
in supplemented IMDM. They were kept on 10 cm tissue 

15 culture plates and typically split 1:5 to 1:20 every 3 to 
4 days. The following medium was used in this 

procedure: 

Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 
10% calf serum, iron-complemented, heat inactivated 30 
20 min 56°C, 0.3 mg/ml L-glutamine , 25 ^g/ml gentamycin 0.5 
mM 0-mercaptoethanol (pH adjusted with 5 M NaOH, 0.5 
ml) ) . 

Transfectjpn 

Calcium phosphate transfection of 293T cells was 

2 5 performed by slowly adding and under vortexing 10 /i<? 

plasmid DNA in 250 fil 0.25 M CaCl 2 to the same volume of 
2x HEBS buffer while vortexing. After incubation for 10 
to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In 

30 cotransf ection experiments with rev, cells were 
transfected with 10 Mg gpl20IIIb, gpl20IIIbrre, 
syngpl20mnrre or rTHY-lenveglrre and 10 jig of pCMVrev or 
CDM7 plasmid DNA. 
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The following solutions were used in this 
procedure: 2x HEBS buffer (280 mM NaCI, io mM KC1, 1.5 mM 
sterile filtered); 0.25 mM CaCl 2 (autoclaved) . 

5 After 48 to 60 hours medium was exchanged and 

cells were incubated for additional 12 hours in Cys/Met- 
free medium containing 200 nd of 35 S-translabel . 
Supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease 
10 inhibitors leupeptin, aprotinin and PMSF to 2.5 ng/ml, 50 
Mg/ml, 100 ug/ml respectively, 1 mi of supernatant was 
incubated with either 10 M i of packed protein A sepharose 
alone (rTHY-lenveglrre) or with protein A sepharose and 3 
ag of a purified CD4/ immunoglobulin fusion protein 
15 (kindly provided by Behring) (all gpi 2 0 constructs) at 
4«C for 12 hours on a rotator. Subsequently the protein 
A beads were washed 5 times for 5 to 15 min each time. 
After the final wash 10 „l of loading buffer containing 
was added, samples were boiled for 3 min and applied on 
20 71 (all gpl2 0 constructs) or 10% (rTHY-lenveglrre) SDS 
polyacrylamide gels (TRIS pH 8.8 buffer in the resolving, 
TRIS pH 6.8 buffer in the stacking gel, TRIS-glycin 
running buffer, Maniatis et al. 1989). Gels were fixed 
in 10% acetic acid and 10 % methanol, incubated with 
25 Amplify for 20 min, dried and exposed for 12 hours. 

The following buffers and solutions were used in 
this procedure: Wash buffer (100 mM Tris, pH 7.5, 150 mM 
NaCI, 5 mM CaCl 2 , 1% NP-40) ; 5x Running Buffer (125 mM 
Tris, 1.25 M Glycin, 0.5% SDS); Loading buffer (10 % 
30 glycerol, 4% SDS, 4% 0-mercaptoethanol , 0.02 % bromphenol 
blue) . 

Immunof ^uo rescenrp 

293T cells were transfected by calcium phosphate 
coprecipitation and analyzed for surface THY-1 expression 
35 after 3 days. After detachment with 1 mM EDTA/PBS , cells 
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were stained with the monoclonal antibody OX-7 in a 
dilution of 1:250 at 4°C for 20 min, washed with PBS and 
subsequently incubated with a 1:500 dilution of a FITC- 
conjugated goat anti-mouse immunoglobulin antiserum. 
5 Cells were washed again, resuspended in 0.5 ml of a 
fixing solution, and analyzed on a EPICS XL 
cytof luorometer (Coulter) . 

The following solutions were used in this 
procedure: 

10 PBS (137 mM NaCl, 2.7 oM KC1, 4.3 eM Na 2 HP0 4 , 1.4 bN 
KH 2 P0 4 , pH adjusted to 7.4); Fixing solution (2% 
formaldehyde in PBS) . 
E U S A 

The concentration of gpl20 in culture supernatants 

15 was determined using CD4-coated ELISA plates and goat 
anti-gpl20 antisera in the soluble phase. Supernatants 
of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3000 rpm for 10 min to 
remove debris and incubated for 12 hours at 4°C on the 

20 plates. After 6 washes with PBS 100 Ml of goat anti- 

gpl20 antisera diluted 1:200 were added for 2 hours. The 
plates were washed again and incubated for 2 hours with a 
peroxidase-conjugated rabbit anti-goat IgG antiserum 
1:1000. Subsequently the plates were washed and 

25 incubated for 30 min with 100 m! of substrate solution 
containing 2 mg/ml o-phenylenediamine in sodium citrate 
buffer. The reaction was finally stopped with 100 jil of 
4 M sulfuric acid. Plates were read at 490 nm with a 
Coulter microplate reader. Purified recombinant 

30 gpl20IIIb was used as a control. The following buffers 
and solutions were used in this procedure: Wash buffer 
(0.1% NP40 in PBS); Substrate solution (2 mg/ml o- 
phenylenediamine in sodium citrate buffer) . 
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Use 

for e», The Synth " iC " n " « '"■ invention are u „ Iul 

»a™.nen in «11 culture for comerc \ n al 

5 proauct.cn cf „„»,„ proteins suc „ 

VII. ana Factor IX, . Th . svntnetic genes ; 
invention are also useful for gene therapy. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: SEED, BRIAN 

<ii) TITLE OF INVENTION : OVEREX PRESS ION OF MAMMALIAN AND VIRAL 

PROTEINS 

<iii) NUMBER OF SEQUENCES: 37 

<iv) CORRESPONDENCE ADDRESS : 

(A) ADDRESSES i Fish £ Richardson 

(B) STREET: 225 Franklin Street 

(C) CITY: Boston 

(D) STATE: Massachusetts 

(E) COUNTRY: U.S.A. 

(F) ZIP: 02110-2804 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
<B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30B 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/308,286 

(B) FILING DATE: 19-SEP-1994 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: CLARK, PAUL T 

(B) REGISTRATION NUMBER: 30,162 

(C) REFERENCE /DOCKET NUMBER: 00786/226001 

(ix) TELECOMMUNICATION INFORMATION: 
<A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: (617) 542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 
<D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
CGCGGGCTAG CCACCGAGAA CCTG 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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fxi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
ACCGAGAACC TGTGGGTCAC CCTGTACTAC CCCCTCCCCC TCTCGAACAC ACCCCACCAC 
CACCCTCTTC TCCCCCACCC ACCCCAACCC CTACGACACC CACCTCCACA ACCTGTCCCC 
CACCCAGGCC TGCCTGCCCA CCCACCCCAA CCCCCAGGAG GTGGAGCTCG TCAACCTGAC 
CGAGAACTTC AACATG 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 bage pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



60 
120 
180 
196 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCACCATCTT GTTCTTCCAC ATGTTGAAGT TCTC 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 33 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GACCCAGAAC TTCAACATGT GGAAGAACAA CAT 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 5: 

TGGAAGAACA A CAT GO TOO A GCACATCCAT CAGCACATCA TCAGCCTGTG GCACCAGACC 60 

CTGAAGCCCT GCCTCAACCT GACCCCCTGT CCGTGACCTG AACTGCACCG ACCTGAGGAA 120 

CACCACCAAC ACCAACACAG CACCGCCAAC AACAACAGCA ACAGCCAGGG CACCATCAAG 180 
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GGCGGCGAGA TG 

(2) INFORMATION FOR SEQ 10 NO: 6: 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 33 base pain 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY i linear 



(xi) SEQUENCE DESCRIPTION i SEQ ID NO: 6: 
CTTGAAGCTG CAGTTCTTCA TCTCGCCGCC CTT 3 3 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: . single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CAAGAACTCC AG CTTCAACA TCACCACCAG C 31 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 
{ D ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

AACATCACCA CCAGCATCCG CGACAAGATG CACAAGCAGT ACGCCCTGCT GTACAAGCTG 60 

CATATCGTGA CCATCGACAA CGACAGCACC ACCTACCGCC TGATCTCCTG CAACACCAGC 120 

GTGATCACCC AGGCCTGCCC CAAGATCAGC TTCGAGCCCA TCCCCATCCA CTACTGCGCC 180 

CCCGCCGGCT TCGCC 19 5 

(2} INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GAACTTCTTC TCGCCGGCGA ACCCCCCGGG 

30 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOCY: linear 



(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 10: 
GCGCCCCCCC CGCCTTCCCC ATCCTCAACT CCAACGACAA CAACTTC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 198 base pairs 

(B) TYPE! nucleic acid 
<C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCCCACAACA ACTTCAGCGG CAAGGGCAGC TGCAAGAACG TGAGCACCGT GCAGTGCACC 
CACGGCATCC GGCCGCTGGT GAGCACCCAC CTCCTCCTCA ACCCCACCCT GGCCGAGGAC 
CAGGTGCTCA TCCGCAGCGA GAACTTCACC GACAACGCCA ACACCATCAT "cCTCCACCTC 
AATCAGAGCG TGCAGATC 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucl«ic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



60 
120 
ISO 
198 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGTTGCGACG CGTGCAGTTG ATCTGCACGC TCTC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13i 
GAGAGCGTGC ACATCAACTG CACGCGTCCC 30 
(2) INFORMATION FOR SEQ ID NO:14: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 base pair* 

(B) TYPE : nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AACTGCACGC GTCCCAACTA CAACAAGCGC AAGCGCATCC ACATCGGCCC CGCCCGCGCC 60 
TTCTACACCA CCAAGAACAT CATCGGCACC ATCCTCCAGG CCCACTGCAA CATCTCTAGA 120 

(2) INFORMATION FOR SEQ ID NO: 15: 

<i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 30 base pairs 
{ B ) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GTCGTTCCAC TTGGCTCTAG AGATGTTGCA 30 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 29 base pairs 
(8) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAACATCTC TAGAGCCAAG TGGAACGAC 29 
(2) INFORMATION FOR SEQ ID NO: 17: 

<i> SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 131 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 
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(xi) SKQUZNCZ DESCRIPTION: SEQ ID NO:17: 
CCCAACTCCA ACGACACCCT CCCCCACATC CTCACCAACC ^CACCA GTTCAACAAC 
AACACCATCC TCTTCACCAC ACCACCOCCC CCCACCCCCA CATCCTCATC CACACCTTCA 
ACTCCCCCGC C 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EONESS : single 
(0) TOPOLOGY: linear 



6C 
120 
131 



(xi) SEQUENCE DESCRIPTION: SEQ ID 
GCAGTAGAAG AATTCCCCGC CGCAGTTGA 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOCY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TCAACTGCGG CCCCCAATTC TTCTACTGC 

29 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH : 195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
CCCCAATTCT TCTACTGCAA CACCAGCCCC CTGTTCAACA GCACCTGGAA CGGCAACAAC 
ACCTGGAACA ACACCACCGG CAGCAACAAC AATATTACCC TCCAGTGCAA GATCAAGCAG I20 
ATCATCAACA TGTGGCAGGA GGTGGGCAAG GCCATGTACG CCCCCCCCAT CGAGGGCCAG 180 
ATCCCGTCCA GCAGC 

195 

(2) INFORMATION FOR SEQ ID NO : 2 1 : 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0 ) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
CCAGACCGGT GATCTTGCTG CTGCACCGGA TCTGCCCCTC 40 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : a ingle 

(D) TOPOLOGY: linear 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGACCCCCAG ATCCGGTGCA GCAGCAACAT CACCGGTCTG 40 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

AACATCACCG CTCTGCTCCT CCTGCTCACC CGCACGGCGG CAAGGACACC GACACCAACG 60 

ACACCGAAAT CTTCCCCGAC GGCGGCAAGG ACACCAACGA CACCGAAATC TTCCCCCCCG 120 

CCGGCGGCGA CATGCGCGAC AACTGGAGAT CTCACCTGTA CAAGTACAAG GTGGTGACCA 180 

TCGAGCCCCT GGGCGTGGCC CCCACCAAGG CCAAGCGCGC GGTCGTGCAG CGCGACAAGC 2 40 



(2) INFORMATION FOR SEQ ID NO: 24: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NOl ; 

CCCGCCCGGC CGCTTTACCG CTTCTCGCCC TCCACCAC 

(2) INFORMATION FOR SEQ ID NO; 25: 

U> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 39 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: ■ ingle 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION* SEQ ID NO: 25: 
CCCGGGCGAT CCAAGCTTAC CATGATTCCA GTAATAACT 

39 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
ATCAATCCAC TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATCAG TAGAGCACAA 
AGAGTAATAA GTTTAACAGC ATCTTTAGTA AATCAAAATT TGACATTAGA TTGTAGACAT 
GAAAATAATA CAAATTTCCC AATACAACAT CAATTTTCAT TAACG 
(2) INFORMATION FOR SEQ ID NO:27: 

U> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



60 
120 
165 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CCCGCGGAAT TCACCCCTTA ATCAAAATTC ATGTTG 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



WO 96/09378 



PCT/US95m5U 



- 50 - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CCCCGATCCA CGCCTCAAAA AAAAAAACAT 
(2) IN FORMAT I ON FOR SEQ ID NO: 29: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 149 base pair* 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION t SEQ ID NO: 29: 
CGTGAAAAAA AAAAACATGT ATTAAGTGCA ACATTAGGAG TACCAGAACA TACATATAGA 
AGTAGAGTAA TTTGTTTAGT GATAGATTCA TAAAACTATT AACATTAGCA AATTTTACAA 
CAAAAGATGA AGGAGATTAT ATGTGTGAG 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pair* 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CGCGAATTCG AG CT CACACA TATAATCTCC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pair* 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CGCGGATCCG ACCTCAGAGT AAGTGGACAA 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
CTCACACTAA GTGCACAAAA TCCAACAAGT AGTAATAAAA CAATAAATCT AATAACACAT 60 
AAATTACTAA AATCTGAGCA ATAACTTTAT TACTACAAAA TACAACTTCG TTATTATTAT 120 
TATTATTAAG TTTAACTTTT TTACAACCAA CAGATTTTAT AAGTTTATGA 170 
(2) INFORMATION FOR SEQ ID NO:33i 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 36 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CCCGAATTCG CGGCCGCTTC ATAAACTTAT AAAATC 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1632 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CTCGAGATCC ATTGTGCTCT AAA GG AG ATA CCCGGCCAGA CACCCTCACC TGCGGTGCCC 
AGCTCCCCAG GCTGAGGCAA GAGAAGGCCA CAAACCATGC CCATCOCCTC TCTGCAACCC 
CTGCCCACCT TGTACCTGCT CCCCATCCTG GTCCCTTCCC TGCTACCCAC CCACAAGCTG 
TGCGTGACCC TCTACTACGC CCTCCCCGTC TGGAAGCAGG CCACCACCAC CCTGTTCTCC 
GCCACCGACG CCAAGGCGTA CGACACCGAG GTGCACAACG TGTGGGCCAC CCAGGCGTGC 
GTGCCCACCG ACCCCAACCC CCACCAGCTG GACCTCCTGA ACGTGACCGA GAACTTCAAC 
ATCTGCAAGA ACAACATGCT CO A CCA CATC CATCAGGACA TCATCACCCT GTGCGACCAC 
ACCCTGAACC CCTGCGTGAA GCTCACCCCC CTGTCCGTGA CCCTCAACTG CACCGACCTG 
ACCAACACCA CCAACACCAA CAACACCACC CCCAACAACA ACAGCAACAG CGAGCGCACC 
ATCAAGCCCG CCGAGATGAA CAACTGCACC TTCAACATCA CCACCAGCAT CCGCCACAAG 
ATGCACAAGC AGTACGCCCT GCTGTACAAG CTGGATATCG TGAGCATCGA CAACCACAGC 
ACCAGCTACC GCCTCATCTC CTGCAACACC ACCGTGATCA CCCAGCCCTC GCCCAAGATC 
AGCTTCGAGC CCATCCCCAT CCACTACTGC GCCCCCGCCG GCTTCGCCAT CCTGAAGTGC 
AACCACAACA AGTTCAGCGG CAAGCGCAGC TCCAAGAACC TGAGCACCCT GCACTCCACC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
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CACGGCATCC GGCCGGTGGT GAGCACCCAC CTCCTGCTGA ACGGCAGCCT GGCCGAGCAG 
GAGGTGGTGA TCCGCAGCGA GAACTTCACC GACAACGCCA AGACCATCAT CGTGCACCTG 960 
AATGAGAGCG TGCAGATCAA CTGCACGCGT CCCAACTACA ACAAGCGCAA GCCCATCCAC 1020 
ATCGGCCCCC GCCGCGCCTT CTACACCACC AAGAACATCA TCGGCACCAT CCGCCAGGCC 1080 
CACTGCAACA TCTCTAGAGC CAAGTGGAAC CACACCCTGC GCCAGATCGT GACCAAGCTG 1140 
AAGGAGCAGT TCAAGAACAA GACCATCGTG TTCAACCAGA GCAGCGGCGG CGACCCCGAG 1200 
ATCGTGATGC ACAGCTTCAA CTGCGGCGGC GAATTCTTCT ACTGCAACAC CAGCCCCCTG 
TTCAACAGCA CCTGGAACGG CAACAACACC TGGAACAACA CCACCGGCAG CAACAACAAT 
ATTACCCTCC AGTGCAAGAT CAAGCAGATC ATCAACATCT GGCAGGAGGT GGGCAAGGCC 
ATGTACGCCC CCCCCATCGA GGGCCAGATC CGGTGCAGCA GCAACATCAC CGGTCTGCTG 
CTGACCCGCG ACGGCGCCAA GGACACCGAC ACCAACGACA CCGAAATCTT CCCCCCCCGC 
GGCGGCGACA TGCGCGACAA CTGGAGATCT GAGCTGTACA AGTACAAGGT GGTGACGATC 
GAGCCCCTGG CCGTGCCCCC CACCAAGGCC AAGCGCCGCG TGCTGCAGCG CCAGAACCCC 
TAAAGCGGCC GC 

(2) INFORMATION FOR SEQ ID NO: 35: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2481 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



1260 
1320 
1380 
1440 
1500 
1560 
1620 
1632 



xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



ACCGAGAAGC 


TGTGGGTGAC 


CGTGTACTAC 


GGCGTGCCCG 


TGTGGAAGGA 


CGCCACCACC 


60 


ACCCTGTTCT 


GCGCCAGCGA 


CGCCAAGGCG 


TACGACACCC 


AGCTGCACAA 


CGTCTCCCCC 


120 


ACCCAGGCGT 


GCGTGCCCAC 


CCACCCCAAC 


CCCCAGGAGC 


TGCAGCTCCT 


GAACGTGACC 


180 


GAGAACTTCA 


ACATGTGGAA 


GAACAACATG 


CTGGAGCAGA 


TGCATGAGGA 


CATCATCAGC 


240 


CTGTGGGACC AGAGCCTGAA 


GCCCTGCGTG 


AAGCTCACCC 


CCCTGTGCGT 


GACCCTGAAC 


300 


TGCACCGACC 


TGAGGAACAC 


CACCAACACC 


AACAACAGCA 


CCGCCAACAA 


CAACAGCAAC 


360 


AGCGAGGCCA 


CCATCAAGGG 


CGGCGAGATG 


AAGAACTGCA 


CCTTCAACAT 


CACCACCAGC 


420 


ATCCGCGACA 


AGATGCAGAA 


GGAGTACGCC 


CTGCTGTACA 


ACCTGGATAT 


CGTGAGCATC 


480 


CACAACGACA GCACCAGCTA 


CCGCCTGATC 


TCCTGCAACA 


CCAGCGTGAT 


CACCCAGGCC 


540 


TCCCCCAAGA 


TCAGCTTCGA 


GCCCATCCCC 


ATCCACTACT 


GCCCCCCCGC 


CGGCTTCGCC 


600 


ATCCTGAACT 


GCAACGACAA 


GAACTTCACC 


GGCAAGGGCA 


GCTGCAAGAA 


CGTGACCACC 


660 
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GTCCACTGCA CCCACCCCAT CCGGCCGGTG GTCAGCACCC AGCTCCTGCT CAACCCCACC 720 

CTGCCCCAGC AGCAGCTGGT CATCCCCACC CAGAACTTCA CCGACAACCC CAAGACCATC 780 

ATCCTGCACC TGAATGAGAG CGTGCAGATC AACTCCACGC GTCCCAACTA CAACAACCGC 840 

AACCGCATCC ACATCGGCCC CGCGCGCGCC TTCTACACCA CCAAGAACAT CATCCCCACC 900 

ATCCCCCAGC CCCACTGCAA CATCTCTAGA GCCAAGTGCA ACGACACCCT GCCCCACATC 960 

CTGACCAAGC TCAAGCAGCA GTTCAAGAAC AAGACCATCC TCTTCAACCA CACCACCCCC 1020 

CCCCACCCCG AGATCCTCAT CCACACCTTC AACTCCCCCC GCGAATTCTT CTACTCCAAC 1080 

ACCACCCCCC TGTTCAACAG CACCTCGAAC GCCAACAACA CCTGGAACAA CACCACCCCC 1140 

AGCAACAACA ATATTACCCT CCAGTCCAAG ATCAACCAGA TCATCAACAT CTGGCAGCAG 1200 

CTGGCCAAGC CCATCTACGC CCCCCCCATC CACGGCCAGA TCCGGTCCAG CAGCAACATC 1260 

ACCGCTCTCC TGCTGACCCG CGACGCCGCC AAGGACACCC ACACCAACCA CACCGAAATC 1320 

TTCCGCCCCC GCGGCGGCGA CATGCGCGAC AACTGGAGAT CTGACCTCTA CAAGTACAAG 1380 

CTGCTCACGA TCGAGCCCCT CGGCGTGGCC CCCACCAAGC CCAAGCGCCC CCTCGTGCAG 1440 

CCCGACAACC GCGCCGCCAT CCGCGCCCTC TTCCTCCCCT TCCTGCCCCC GGCCGGCAGC 1S00 

ACCATGCGGG CCGCCAGCGT GACCCTGACC GTCCACGCCC GCCTCCTCCT GAGCGGCATC 1560 

CTCCAGCAGC AGAACAACCT CCTCCGCGCC ATCGAGGCCC AG C AG CAT AT GCTCCACCTC 1620 

ACCGTGTGGG CCATCAAGCA CCTCCAGGCC CGCGTCCTGG CCGTGGAGCC CTACCTGAAC 1680 

CACCAGCAGC TCCTGGCCTT CTGGGGCTGC TCCGCCAAGC TGATCTCCAC CACCACCCTA 1740 

CCCTGCAACG CCTCCTGCAG CAACAAGAGC CTGGACGAGA TCTCCAACAA CATGACCTCC 1800 

ATCCAGTGGG AGCCCGAGAT CGATAACTAC ACCAGCCTCA TCTACAGCCT GCTGCAGAAG 1860 

ACCCAGACCC AGCAGCAGAA GAACGAGCAG CAGCTCCTGG AGCTGGACAA CTGGGCGAGC 1920 

CTCTCGAACT GGTTCGACAT CACCAACTCC CTGTGCTACA TCAAAATCTT CATCATGATT 1980 

GTGGGCGGCC TGGTGCCCCT CCGCATCGTG TTCGCCGTGC TCAGCATCGT GAACCGCGTG 2040 

CGCCAGCGCT ACAGCCCCCT GAGCCTCCAG ACCCGGCCCC CCGTGCCGCG CGGCCCCGAC 2100 

CCCCCCGAGC CCATCGAGGA GGAGGGCGGC CAGCGCGACC GCGACACCAC CGCCAGCCTC 2160 

GTCCACCGCT TCCTGGCGAT CATCTGGGTC GACCTCCGCA GCCTGTTCCT GTTCAGCTAC 2220 

CACCACCCCC ACCTGCTCCT CATCGCCGCC CGCATCGTGG AACTCCTACC CCGCCGCGCC 2 280 

TGGGAGCTCC TGAAGTACTG GTGGAACCTC CTCCAGTATT GGAGCCAGGA GCTGAAGTCC 2 340 

AGCGCCCTGA GCCTGCTGAA CGCCACCGCC ATCGCCGTGG CCCAGGCCAC CGACCGCGTG 2400 

ATCGAGCTCC TCCAGAGCGC CGCCACCGCC ATCCTCCACA TCCCCACCCC CATCCGCCAC 2460 

GCCCTCGAGA CCCCCCTGCT G 2481 
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(2) INFORMATION FOR SEQ ID NOt36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 486 base pair 8 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: aingle 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATGAG TACAGGACAA 60 

AGAGTAATAA CTTTAACAGC ATGTTTAGTA AATCAAAATT TCACATTAGA TTGTAGACAT 120 

GAAAATAATA CACCTTTCCC AATACAACAT GAATTTTCAT TAACGCGTGA AAAAAAAAAA 180 

CATGTATTAA GTGGAACATT AGGAGTACCA GAACATACAT ATAGAAGTAG AGTAAATTTG 240 

TTTAGTGATA GATTCATAAA AGTATTAACA TTAGCAAATT TTACAACAAA AGATGAAGGA 300 

GATTATATCT GTGACCTCAG AGTAAGTGGA CAAAATCCAA CAACTAGTAA TAAAACAATA 360 

AATGTAATAA GAGATAAATT AGTAAAATGT GGAGGAATAA GTTTATTACT ACAAAATACA 420 

AGTTGGTTAT TATTATTATT ATTAAGTTTA AGTTTTTTAC AAGCAACAGA TTTTATAAGT 4 80 

TTATGA 486 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 485 baae pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: eingle 
(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



ATGAACCCAG 


TCATCAGCAT 


CACTCTCCTG 


CTTTCAGTCT 


TGCAGATGTC 


CCGAGGACAG 


60 


AGGGTGATCA 


GCCTGACAGC 


CTGCCTGGTG 


AACAGAACCT 


TCCACTCGAC 


TGCCGTCATG 


120 


AGAATAACAC 


CAACTTGCCC 


ATCCAGCATC 


AGTTCAGCCT 


CACCCGAGAG 


AACAAGAAGC 


180 


ACGTCCTCTC 


AGGCACCCTG 


GGGGTTCCCG 


AGCACACTTA 


CCGCTCCCGC 


GTCAACCTTT 


240 


TCAGTGACCG 


CTTTATCAAG 


GTCCTTACTC 


TAGCCAACTT 


GACCACCAAG 


GATGAGGGCG 


300 


ACTACATGTG 


TGAACTTCGA 


GTCTCCGGCC 


AGAATCCCAC 


AAGCTCCAAT 


AAAACTATCA 


360 


ATCTGATCAG 


AG ACAAG CTG 


GTCAAGTGTG 


GTGGCATAAG 


CCTGCTGGTT 


CAAAACACTT 


420 


CCTGGCTGCT 


GCTGCTCCTG 


CTTTCCCTCT 


CCTTCCTCCA 


AGCCACGGAC 


TTCATTTCTC 


480 


TGTGA 












485 



What ie claimed ie: 
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1. A synthetic gene encoding a protein normally 
expressed in mammalian cells wherein at least one non- 
preferred or less preferred codon in the natural gene 
encoding said mammalian protein has been replaced by a 
preferred codon encoding the same amino acid. 

2. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least lio% of that 
expressed by said natural gene in an in vitro mammalian 
cell culture system under identical conditions. 



3. The synthetic gene of claim l wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least 150% of that 
expressed by said natural gene in an in vitro cell 

15 culture system under identical conditions. 

4. The synthetic gene of claim l wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least 200% of that 
expressed by said natural gene in an in vitro cell 

20 culture system under identical conditions. 

5. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least 500% of that 
expressed by said natural gene in an in vitro cell 

2 5 culture system under identical conditions. 

6. The synthetic gene of claim 1 wherein said 
synthetic gene is capable of expressing said mammalian 
protein at a level which is at least ten times that 
expressed by said natural gene in an jjj vitro cell 

3 0 culture system under identical conditions. 
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7. The synthetic gene of claim 1 wherein at least 
10% of the codons in said natural gene are non-preferred 
codons. 

8. The synthetic gene of claim 1 wherein at least 
5 50% of the codons in said natural gene are non-preferred 

codons. 

9. The synthetic gene of claim 1 wherein at least 
50% of the non-preferred codons and less preferred codons 
present in said natural gene have been replaced by 

10 preferred codons. 

10. The synthetic gene of claim 1 wherein at 
least 90% of the non-preferred codons and less preferred 
codons present in said natural gene have been replaced by 
preferred codons. 

15 11# T ^e synthetic gene of claim 1 wherein said 

protein is a retroviral or lentiviral protein. 

12. The synthetic gene of claim 11 wherein said 
protein is an HIV protein. 

13. The synthetic gene of claim 12 wherein said 
20 protein is selected from the group consisting of gag, 

pol, and env. 

14. The synthetic gene of claim 13 wherein said 
protein is gpl20 or gp!60. 



15. The synthetic gene of claim 1 wherein sa 
25 protein is a human protein. 
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16. A method for preparing a synthetic gene 
encoding a protein normally expressed by mammalian cells, 
comprising identifying non-preferred and less-preferred 
codons in the natural gene encoding said protein and 
5 replacing one or more of said non-preferred and less- 
preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon* 
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1 C7GGACATCC A77G7GC7C7 AAAGGAGA7A ZZZZZZZXGK CACCC7CACC 
5; 7GCGG7GCCC A<3C7GCGGAC CC7GAGGCAA GAGAAGGCCA GAAACCA7CC 

;:: ccatgcggtc t:?scaacc3 ctggccacct tgtacctgct gggga7cctg 
.31 crcGcrrccs tgctagccac ggagaagc7G tgggtgaccg tgtactacgg 

201 CGTCCCrGTG 75GAACGAGG CGACCACCAC C77GT7C7GC GCCACCGACG 
251 C7AACGCG7A CGACACCGAG G7GCACAACG 7G7GGGCCAC CCAGGCG7CC 
27SCCCACCG A-CGCAACCC CCAGGAGG7G GAGC77C7GA ACG7GACCGA 

3S1 GAAC77GAAC A7G7GGAAGA ACAACA7CG7 GGAGCAGA7G CATGAGGACA 

;:i "-A7CAGCC7 07GGGACGAG AGCC7GAAGC C77GCG7GAA CC7GACCCCC 

4 51 C7G7GCG7GA C :77GAAC73 CACG3ACC7G AGGAACACCA CCAACACCAA 
5C1 CAACAGCACC &CCAACAACA ACAGCAACAG C3AGGGCACC A7CAAGGGCG 

5 = 1 GCGAGA7GAA CAAC7GCAGC 77CAACA7CA CCACCAGCA7 CCGCGACAAG 

6 01 A7GCAGAAGG- A37ACGCCC7 GCTG7ACAAG C7GGATA7CG 7GAGCA7CGA 
izl CAACGACAGC AICAGC7ACC GC77CA7C7C G7GCAACACC AGCG7GA7CA 
":: C:7AGGCC7G qCCCAAGATC AGC77CCAGC CCA7CCCCAT CCAC7ACTGC 

" 1 ^~-ZZZZZZZ G P 777C3C7A7 CC7GAAG7GC AACGACAAGA AC77CAGCCG 

3 01 7AACCGCAGC TGCAAGAACG 7GAGCACCG7 GCAG7GCACC CACGGCA7CC 

=51 3GCCGG7GG7 (^GCACCrAG C77C7GC7GA ACCGCAGCC7 GGCCGAGGAG 

5:i GAGG7GG7GA rCCGCAGCCA GAAC77CAC- GACAACGCCA AGACCA7CA7 

351 CG7GCACCTG AATGAGAGC3 7GCAGA7CAA C7GCACGCG7 CCCAACTACA 

i:Cl ACAAGCGCAA ^GCA7CCAC A7CGGCCCCG GGCGCGCC77 77ACACCACC 

i:Sl AAGAACA7CA TCGGCACCAT CZZCCXGCZZ CAC7GCAACA 7C7C7AGAGC 

1101 CAAG7GGAAC CACACCC7GC GCCAGA7C3T 3AGCAAGC7G AAGGAGCAC7 

1151 77AAGAACAA CACCA7CG7G 77CAACCAGA GCAGCGGCGG CGACCCGGAG 

lid A7CGTGA7GC ACAGC77CAA C7GCGGCGGC GAA77777r7 AC7GCAACAC 

12S; CAGCGrrr7G T7CAACAGCA CC7GGAACGG CAACAACACC 7GGAACAACA 

13 01 CCACCGGCAC CAACAACAA7 A7TACCC7:7 AG7GCAAGA7 CAAGCAGA7C 

13 51 A7CAACA7G7 GGCAGGAGG7 GGGCAAGGCC A7C7ACGCCC CCCCCATCGA 

KOI GGGCCAGA7C CGG7GCAGCA GCAACATCAC CGG7C7GCTG C7CACCCCCG ^ ' 

1451 ACGCCCGCAA CGACACCGAC ACCAACGACA CCCAAA7C7T CCGCCCCGGC ( 5H6€T * Of 
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-S 01 3CCGOC3ACA T^CGCCACAA CTCCAGA'-C" iri , ^ 

aGA.C. -^.^ACA AG7ACAAGC7 

1551 OCTCACCATC ,C3T=C=CC= CACCAACOCC 

TserccAsce csasaagck ta*acckc= k ^Sc© , 0 MO^) 
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i ACCwAGAAGC T37 GGG7GA7 C3737AC7AC GG-3-"r:CG 7S7CCAAGCA 

= : cccgacca;:: a:™;—— zzzzzxoczx czzzxxqccz zxzzxcxczz 
::: ajgtgcacaa c;7—:ccc: a—agg.™ 3"7gc:cac ::ac:c:aac 

.:- ZZZZ^ZZXZZ 7^CACC7737 CAACGT^AftC 2AGAAC77CA ACATG73GAA 

saacaa.ca7s c7=gagcaga 7—atgagga catca7Cagc :7G7gggacc 
:si agagc:tgaa *:™- -rzrz aagc7:accc :™G7GCC7 gacc:tcaac 

7GCAC"Ar7 T~AGGAACAC Z.\ZZWZAZZ AACAACACCA CZCCC:.\CAA 
rAACA^CAAC AGCGAGGGCA CZATCAAGGG CCvCIACA7G AAGAAC7CCA 

i-: *rrrrAACA7 cArrACCAc: a?::3C3ACa agatcgagaa ccao iacccc 

" ' • "~~™TACA A3C73CA7A7 C37CAGCA7C ZACAACSACA -JCACCACCTA 
--- __._.->.._ . _ . . j ,/^^.A CACCG7CA7 C.'.^'JLA^GCC 77C7C7AAGA 

£5: t:a3C777Ca ^c::a7::::: .\tz'_ac :act rrccrrrcrc ccccrrrscr 

a-. ^ ^ w • €ww^^A..'_*. 'JAAG7TCAGC vCCAACCGCA 3C7GCAAGAA 

: : ~ --.CAs-wA^r T twA3 . ZZA ZZZXZZZZXZ -TwCCC30T3 C73AG*7ACrC 

.-»-■ ^AAw'-CCAGC 77CCCCGAGG AG-CACC73-~7 "AT(~~"ACC 

..'.CUCrr.A CC-ACAACCC -AAGAC7A7C A7C37GCAC.7 73AA73ACAO 
?:i ,'3;-3CA3A77 AAC7CCAC3C 3777CAAC7A CAArAAGCSC \AZCZZXTZZ 
" ' : ^-A773CCtr C300C:rCC7 77C7ACACr.A C7.AAGAACA7 7A7C3CCAC: 

;77-crrA^G c"ac7ccaa ca7™ tag* :-czaa~7:ga acgacaccc7 

. J .^CrACA7C C^TSAGCA^" 73AAGGAGCA "77CAACAAC AA3ACCA7C3 

- - : - T'TTrAACTA GA"A3C3GG ZZZZAZZZZZ AGA7CC7~A7 "ACACCT7C 

- : : 1 AAC7""" (^TGAA77"7 77AC7GCAAC ACCACCCZ" 7-7V-AACAC 

"ACrTwGAAC QZZXXCXXCX CC7GGAACAA ZXZZS.ZZ'JZZ AGCAACAACA 

u A7AT7AC777 C-AG73CAAC A77AAGC ( \CA . JA73AACA7 ZZZZZXZZXZ 

375GGCAAGG CCA737ACCC :::::z:A?: ^AGGGCCACA 7Z7CC7GCAC 
TAGCAACA7C ACC-C7r7'J^' ZZZZZAZZZZ ZZAZZZZZZZ .1 AG." ACAC" 

^A V.-'JA CACCwAAATI T7"3C7TT- 3CSGC""A CA7"CGC3AC 

-.^7 CCA3A7 C77.AGC737A C AAG7ACAA.~. "73G7GACGA 7C3AGC7C77 
- C •» ~ A*. AACw >-i"\"~CC3 C 3 7 SC ZZZ AC C3G3AGAAGC 
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• 451 juvjw - jWtaAi Z j3w - - • -< ,TGG TGGGG . . .-.G^jGGGG GCCSOCCACC 

1501 ACCA7SGGGG CCGGGAGGG7 GACGGTGAGG G7CCAGGCCG GCC7CG7GG7 

1551 GAGGGGGA7G >;TGCAGCAGC AGAACAACGT G'TGCGCGCC ATCG AGGCGG 

.£01 ACCAGCATAT ^77C-Ao\- * AG>.G7G7GGG GGATGAAGGA G C7G G AGGG G 

l£;l GGCGTGGTGG C-G.^GA^vG Z . AGGTGAAG GAGCAGGAGG TGCTCGGGTT 

1 1 1 77GGGGGTGG . - GGGG AAGG TG A7C7GGAG GAG C ACGGTA G G G 7 G G AA G G 

1*51 G GTGG7GGAG C^AGAAGAGG 1TGGAGGACA 7GTGGAAGAA C A7GAGG7GG 

Is 11 ATGG AGTGGG AGGGGGAGA7 GGA7AAC7AG AGG AGG CTG A 7 GTAGAGGGT 

1S51 G GT G G AG AAG AjGGAGAGGG AGCAGGAGAA GAACGAGCAG G AGC7GG7GG 

13 51 G7GTGG7AGA T*GAAAA7G7T CA7GA7GA7T CTGG-GGGGGG 7GG7GGGGG7 

.111 ~ 7 GGA7GG" G T-G.js.wj.—- . ja\j\. A . v ^ . -svAG «. w . ^ v.^>..,Atj>jgi. . 
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1111 ~ 3GGGCGAG\j QkAt v .-rv<jsjA jAGuij^ww^ v^ut. j>- vACw -j*- ^A—ACGA^j 
. „ ; „ 7G GCAGwn— TG ^Gva^wmv. . — . jj^. ja 4 A . G . <^*r*j * w o/\CG.——GGA 

:::: :cct-tc=? sttiasctac :accac:;c= ac-sctsct 3atcccc=cc 

GAGG7GG 7GAAG7AG7G 

IjC I Z7GGAACG7G CT-GGAGTA . . ^^AGGGAGGA -vGTGAAG * C» Akj v.siC<- j . 'GA 

12 3 1 3GGTGG7GAA C — A> » . — iou — jn^vjuv. Aw >--j<^^- — > > — » - 

1 -i 1 1 ATGGAGG7CjG " »GA\^AG^w^ - j^\jA^\j^\_^» A.»..^>-A\.A . — _ _ ^ _/ 
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FIGURE 5 
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Clusifiauion Syitcm: U.S. 

536/23.5 , 23.72; 435/172.3 

B. FIELDS SEARCHED 

Dcwumenuuon other Uun m.nimum document «« .« .nduded in the fields arched: 
NONE 

B FIELDS SEARCHED 

Etammc dau n„ cs consulted (Name ,.f n3 « and w, lcre pnclieahk lcnns usaJ) 
APS. MEDLINE EXPRESS 



urm PCT/ISA/2I0(eMn «hee«KJuly 1992)* 



